BIO 542 : Machine Learning for Biomedical Applications, Monsoon Semester 2017, IIIT - Delhi


Implementation of Improvement on Identification of ATP Binding Residues of Proteins from Primary Sequence



Project Details

We have carried out a brief implementation of the paper “Identification of ATP binding residues of a protein from its primary sequence” and attempted to improve on the existing methods by using different machine learning techniques using an extended dataset and optimized parameters on the models.
We first tried a simple implementation of the paper using the linked dataset on Python, using the scikit-learn library and the svm SVC module, and carried it out on different window sizes.
We obtained maximum cross-validation accuracies of around 64% on a balanced dataset, with window size 17.
We extracted 55 more non - redundant ATP Binding protiens and applied SVM and Neural Network. However, the neural network failed to give a comparable accuracy (around 50%) but we managed to get some improvements using SVM. See Implementation at Github.

Window Size Accuracy Precision Recall Specificity Fscore MCC FPR
24 0.6958 0.7162 0.6486 0.7430 0.6887 0.3934 0.2569