BIO 542 : Machine Learning for Biomedical Applications

BIO 542 : Machine Learning for Biomedical Applications, Monsoon Semester 2017, IIIT - Delhi

Implementation of Improvement on Identification of ATP Binding Residues of Proteins from Primary Sequence

SVM

Results

Window Size	Accuracy	Precision	Recall	Specificity	Fscore	MCC	FPR
16	0.6726	0.6829	0.6447	0.7006	0.6632	0.3459	0.2993
17	0.6856	0.7001	0.6495	0.7218	0.6739	0.3723	0.2781
18	0.6870	0.7056	0.6417	0.7322	0.6721	0.3756	0.2677
19	0.6794	0.6969	0.6348	0.7239	0.6644	0.3147	0.3164
20	0.6898	0.7057	0.6513	0.7284	0.6774	0.3576	0.2913
21	0.6870	0.7083	0.6357	0.7382	0.6701	0.3459	0.2993
22	0.6861	0.7075	0.6345	0.7376	0.6690	0.3742	0.2699
23	0.6719	0.6802	0.6238	0.7200	0.6657	0.3742	0.2617
24	0.6958	0.7162	0.6486	0.7430	0.6887	0.3934	0.2569

Neural Network

Results

Window Size	Validation Loss	Validation Accuracy
16	1.9900	0.4673
17	2.4828	0.4733
18	2.4199	0.4665
19	2.6093	0.4710
20	2.6340	0.4665
21	2.9711	0.4733
22	2.9295	0.4628
23	3.0979	0.4816
24	3.3554	0.4695

Project Details

We have carried out a brief implementation of the paper “Identification of ATP binding residues of a protein from its primary sequence” and attempted to improve on the existing methods by using different machine learning techniques using an extended dataset and optimized parameters on the models.
We first tried a simple implementation of the paper using the linked dataset on Python, using the scikit-learn library and the svm SVC module, and carried it out on different window sizes.
We obtained maximum cross-validation accuracies of around 64% on a balanced dataset, with window size 17.
We extracted 55 more non - redundant ATP Binding protiens and applied SVM and Neural Network. However, the neural network failed to give a comparable accuracy (around 50%) but we managed to get some improvements using SVM. See Implementation at Github.

Window Size	Accuracy	Precision	Recall	Specificity	Fscore	MCC	FPR
24	0.6958	0.7162	0.6486	0.7430	0.6887	0.3934	0.2569