We have carried out a brief implementation of the paper
“Identification of ATP binding residues of a protein from
its primary sequence” and attempted to improve on the existing
methods by using different machine learning techniques using
an extended dataset and optimized parameters on the models.
We first tried a simple implementation of the paper using
the linked dataset on Python, using the scikit-learn library and
the svm SVC module, and carried it out on different window
sizes.
We obtained maximum cross-validation accuracies of
around 64% on a balanced dataset, with window size 17.
We extracted 55 more non - redundant ATP Binding protiens and applied SVM and Neural Network. However, the neural network failed to give a comparable
accuracy (around 50%) but we managed to get some improvements using SVM. See Implementation at
Github.
Window Size |
Accuracy |
Precision |
Recall |
Specificity |
Fscore |
MCC |
FPR |
24 |
0.6958 |
0.7162 |
0.6486 |
0.7430 |
0.6887 |
0.3934 |
0.2569 |