IDENTIFYING MOLECULAR FUNCTIONS OF DYNEIN MOTOR PROTEINS USING EXTREME GRADIENT BOOSTING ALGORITHM WITH MACHINE LEARNING
Main Article Content
The majority of cytoplasmic proteins and vesicles move actively primarily to dynein motor proteins, which are the cause of muscle contraction. Moreover, identifying how dynein are used in cells will rely on structural knowledge. Cytoskeletal motor proteins have different molecular roles and structures, and they belong to three superfamilies of dynamin, actin and myosin. Loss of function of specific molecular motor proteins can be attributed to a number of human diseases, such as Charcot-Charcot-Dystrophy and kidney disease. It is crucial to create a precise model to identify dynein motor proteins in order to aid scientists in understanding their molecular role and designing therapeutic targets based on their influence on human disease. Therefore, we develop an accurate and efficient computational methodology is highly desired, especially when using cutting-edge machine learning methods. In this article, we proposed a machine learning-based superfamily of cytoskeletal motor protein locations prediction method called extreme gradient boosting (XGBoost). We get the initial feature set All by extraction the protein features from the sequence and evolutionary data of the amino acid residues named BLOUSM62. Through our successful eXtreme gradient boosting (XGBoost), accuracy score 0.8676%, Precision score 0.8768%, Sensitivity score 0.760%, Specificity score 0.9752% and MCC score 0.7536%. Our method has demonstrated substantial improvements in the performance of many of the evaluation parameters compared to other state-of-the-art methods. This study offers an effective model for the classification of dynein proteins and lays a foundation for further research to improve the efficiency of protein functional classification.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Papers accepted for publication become copyright of the Editorial Office, Journal of Mountain Area Research, Karakoram International University Gilgit, Gilgit-15100, Pakistan and authors will be asked to sign a copyright transfer agreement. Articles cannot be published until a signed copyright transfer agreement form has been received.
The work published in this journal is licensed under a Creative Commons Attribution 4.0 International License.
 R.D. Vale, T.S. Reese, M.P. Sheetz, Identification of a novel force-generating protein, kinesin, involved in microtubule-based motility, Cell 42 (1985) 39–50.
 A.J. Roberts, T. Kon, P.J. Knight, K. Sutoh, S.A. Burgess, Functions and mechanics of dynein motor proteins, Nat. Rev. Mol. Cell Biol. 14 (2013) 713–726
 Hirokawa, N., Noda, Y., Tanaka, Y., & Niwa, S., Kinesin superfamily motor proteins and intracellular transport. Nature reviews Molecular cell biology, 10(10), (2009) 682-696.
 Banci, L., Bertini, I., Boca, M., Calderone, V., Cantini, F., Girotto, S., & Vieru, M., Structural and dynamic aspects related to oligomerization of apo SOD1 and its mutants. Proceedings of the National Academy of Sciences, 106(17), (2009) 6980-6985.
 Chen, X. J., Xu, H., Cooper, H. M., & Liu, Y., Cytoplasmic dynein: a key player in neurodegenerative and neurodevelopmental diseases. Science China Life Sciences, 57(4), (2014) 372-377.
 Eschbach, J., & Dupuis, L., Cytoplasmic dynein in neurodegeneration. Pharmacology & therapeutics, 130(3) (2011) 348-363.
 Bar‐Or, A., Fawaz, L., Fan, B., Darlington, P. J., Rieger, A., Ghorayeb, C., ... & Smith, C. H., Abnormal B‐cell cytokine responses a trigger of T‐cell–mediated disease in MS?. Annals of neurology, 67(4) (2010) 452-461.
 Le, N. Q. K., Yapp, E. K. Y., Ou, Y. Y., & Yeh, H. Y., iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule. Analytical biochemistry, 575 (2019) 17-26.
 Zhu, C., Zhao, J., Bibikova, M., Leverson, J. D., Bossy-Wetzel, E., Fan, J. B., ... & Jiang, W., Functional analysis of human microtubule-based motor proteins, the kinesins and dyneins, in mitosis/cytokinesis using RNA interference. Molecular biology of the cell, 16(7) (2005) 3187-3199.
 Janssens, F., Glänzel, W., & De Moor, B., Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (2007) (360-369).
 H. Khataee, A.W.-C. Liew, A mathematical model describing the mechanical kinetics of kinesin stepping, Bioinformatics 30 (2013) 353–359
 Dutta, M., & Jana, B., Computational modeling of dynein motor proteins at work. Chemical Communications, 57(3), (2021) 272-283.
 Li, L., Alper, J., & Alexov, E. (2016). Cytoplasmic dynein binding, run length, and velocity are guided by long-range electrostatic interactions. Scientific reports, 6(1), (2012)1-12.
 Erdős, G., Szaniszló, T., Pajkos, M., Hajdu-Soltész, B., Kiss, B., Pál, G., ... & Dosztányi, Z., Novel linear motif filtering protocol reveals the role of the LC8 dynein light chain in the Hippo pathway. PLoS computational biology, 13(12), (2017) e1005885.
 Gao, F. J., Hebbar, S., Gao, X. A., Alexander, M., Pandey, J. P., Walla, M. D., ... & Smith, D. S., GSK‐3β phosphorylation of cytoplasmic dynein reduces Ndel1 binding to intermediate chains and alters dynein motility. Traffic, 16(9), (2015) 941-961.
 Ho, Q. T., & Ou, Y. Y., Classifying the molecular functions of Rab GTPases in membrane trafficking using deep convolutional neural networks. Analytical biochemistry, 555, (2018) 33-41.
 Zou, C., Gong, J., & Li, H., An improved sequence-based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC bioinformatics, 14(1), (2013)1-14.
 Zou, Q., Wan, S., Ju, Y., Tang, J., & Zeng, X., Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC systems biology, 10(4), (2016)401-412.
 Tao, Z., Li, Y., Teng, Z., & Zhao, Y., A method for identifying vesicle transport proteins based on LibSVM and MRMD. Computational and Mathematical Methods in Medicine, (2020).
 Kumar, K., & Thakur, G. S. M., Advanced applications of neural networks and artificial intelligence: A review. International journal of information technology and computer science, 4(6), (2012) 57.
 Zhang, Y., Qiao, S., Ji, S., Han, N., Liu, D., & Zhou, J., Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Engineering Applications of Artificial Intelligence, 79, (2019)58-66.
 Arif, Muhammad, et al. "StackACPred: prediction of anticancer peptides by integrating optimized multiple feature descriptors with stacked ensemble approach." Chemometrics and Intelligent Laboratory Systems 220 (2022): 104458.
 Arif, Muhammad, et al. "DeepCPPred: a deep learning framework for the discrimination of cell-penetrating peptides and their uptake efficiencies." IEEE/ACM Transactions on Computational Biology and Bioinformatics 19.5 (2021): 2749-2759.
 Ge, Fang, et al. "TargetMM: Accurate Missense Mutation Prediction by Utilizing Local and Global Sequence Information with Classifier Ensemble." Combinatorial Chemistry & High Throughput Screening 25.1 (2022): 38-52.
 Ghulam, Ali, et al. "Accurate prediction of immunoglobulin proteins using machine learning model." Informatics in Medicine Unlocked 29 (2022): 100885.
 Ghulam, Ali, et al. "ACP-2DCNN: deep learning-based model for improving prediction of anticancer peptides using two-dimensional convolutional neural network." Chemometrics and Intelligent Laboratory Systems 226 (2022): 104589.
 Ghulam, Ali, et al. "Disease-pathway association prediction based on random walks with restart and PageRank." IEEE Access 8 (2020): 72021-72038.
 J. Song, F. Li, K. Takemoto, G. Haffari, T. Akutsu, K.-C. Chou, G.I. Webb, PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework, J. Theor. Biol. 443 (2018) 125–137.
 G.O. Consortium, Expansion of the gene Ontology knowledgebase and resources, Nucleic Acids Res. 45 (2016) D331–D338.
 Jia, K., & Jernigan, R. L., New amino acid substitution matrix brings sequence alignments into agreement with structure matches. Proteins: Structure, Function, and Bioinformatics, 89(6), (2021)671-682.
 Sakhanenko NA, Galas DJ. Biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm. J Comput Biol. 2015;22:1005-1024.
 Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 2017, 12, e0177678.
 Ding, Y.; Tang, J.; Guo, F. Identification of drug–target interactions via fuzzy bipartite local model. Neural Comput. Appl. 2020, 32, 1–17.
 Lee, B.; Richards, F. M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol., 1971, 55, 379-400.
 Statnikov, A.; Wang, L.; Aliferis, C.F. A Comprehensive comparison of random forests and support vector machines for microarray-based cancer classification, BMC Bioinformatics, 2008, 9, 319.
 Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000; 16:412–424.
 Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975; 405:442–451.
 Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging–SVM ensemble classifier. Artif. Intell. Med. 2019, 98, 35–47.
 Jiang, Q.; Wang, G.; Jin, S.; Yu, L.; Wang, Y. Predicting human microRNA–disease associations based on support vector machine. Int. J. Data Min. Bioinform. 2013, 8, 282–293.
 Murugan, A.; Nair, S.A.H.; Kumar, K.P.S. Detection of Skin Cancer Using SVM, Random Forest and KNN Classifiers. J. Med. Syst. 2019, 43, 269.