UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL

Main Article Content

Sikandar Rahu Ali Ghulam Ali Farman Dhani Bux Talpur Mir Sajjad Hussain Talpur Erum Saba Zulfikar Ahmed Maher Saima Tunio

Abstract

A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor.  The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically.  When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification.

Article Details

How to Cite
RAHU, Sikandar et al. UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL. Journal of Mountain Area Research, [S.l.], v. 8, p. 14-26, dec. 2022. ISSN 2518-850X. Available at: <https://journal.kiu.edu.pk/index.php/JMAR/article/view/167>. Date accessed: 31 jan. 2023. doi: https://doi.org/10.53874/jmar.v8i0.167.
Section
Mathematical Sciences

References

[1] Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A.72(1)(1975)11–5
[2] Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005; 102(43):15280–2.
[3] Pickart CM, Eddins MJ. Ubiquitin: structures, functions, mechanisms. Biochim Biophys Acta. 2004; 1695(1–3):55–72.
[4] Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599–609.
[5] Peng JM, Schwartz D, Elias JE, Thoreen CC, Cheng DM, Marsischky G, et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol.21(8) (2003)921–6
[6] Herrmann J, Lerman LO, Lerman A. Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res.;100(9)(2007)1276–91.
[7] Welchman R, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599–609.
[8] Schwartz AL, Ciechanover A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu Rev Med.50 (1999) 57–74.
[9] Zhong J, Shaik S, Wan L, Tron AE, Wang Z, Sun L, Anushka H, Wei W.SCF beta-TRCP targets MTSS1 for ubiquitination-mediated destruction to regulate cancer cell proliferation and migration. Oncotarget. 4(12) ( 2013) 2339–53
[10] B. Yu, Z. Yu, C. Chen, A. Ma, B. Liu, B. Tian, Q. Ma, DNNAce: prediction of prokaryote lysine acetylation sites through deep neural networks with multi-information fusion, Chemomet. Intell. Lab. 200 (2020) 103999.
[11] G. Xu, J.S. Paige, S. R Jaffrey, Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling, Nat. Biotechnol. 28 (2010) 868–873.
[12] W. Kim, E.J. Bennett, E.L. Huttlin, A. Guo, J. Li, A. Possemato, M.E. Sowa, R. Rad, J. Rush, M.J. Comb, J.W. Harper, S.P. Gygi, Systematic and quantitative assessment of the ubiquitin-modified proteome, Mol. Cell. 44 (2011) 325–340.
[13] P. Radivojac, V. Vacic, C. Haynes, R.R. Cocklin, A. Mohan, J.W. Heyen, M. G. Goebl, L.M. Iakoucheva, Identification, analysis, and prediction of protein ubiquitination sites, Proteins 78 (2010) 365–380.
[14] Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite:incorporating two-layered machine learning method with substrate motifsto predict ubiquitin-conjugation site on lysines. BMC Syst Biol.10 (Suppl 1)(2016)6.
[15] Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A new scheme tocharacterize and identify protein ubiquitination sites. IEEE/ACM Trans Comput Biol Bioinform.14 (2) (2017)393–403.
[16] Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn.33(8) (2015)1731–42.
[17] Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and speciesspecific ubiquitin conjugation sites. Bioinformatics.29(13) (2013)1614–22.
[18] Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics. 33(5)(2017)661–8
[19] Yuan Y, Xun G, Jia K, Zhang A, Acm: a multi-view deep learning method for epileptic seizure detection using short-time Fourier transform; 2017.
[20] Yuan Y, Xun G, Jia K, Zhang A. A Novel Wavelet-based Model for EEG Epileptic Seizure Detection using Multi-context Learning. In: Hu XH, Shyu CR, Bromberg Y, Gao J, Gong Y, Korkin D, Yoo I, Zheng JH, editors. 2017 Ieee International Conference on Bioinformatics and Biomedicine; (2017).p. 694 –9.
[21] SAnchez, R. O. B. E. R. T. O., & Sali, A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proceedings of the National Academy of Sciences, 95(23), (1998) 13597-13602.
[22] Husnjak, K., & Dikic, I.Ubiquitin-binding proteins: decoders of ubiquitin-mediated cellular functions. Annual review of biochemistry, 81, (2012) 291-322.
[23] Agrahari, A. K., Bose, P., Jaiswal, M. K., Rajkhowa, S., Singh, A. S., Hotha, S. ... & Tiwari, V. K. Cu (I)-catalyzed click chemistry in glycoscience and their diverse applications. Chemical Reviews, 121(13),(2021) 7638-7956.
[24] Wang, M., Cui, X., Li, S., Yang, X., Ma, A., Zhang, Y., & Yu, B. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemometrics and Intelligent Laboratory Systems, 207,(2020) 104175.
[25] Liu, Y., Jin, S., Song, L., Han, Y., & Yu, B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. Journal of Molecular Graphics and Modelling, (2021) 107962.
[26] Alsanousi WA, Ahmed NY, Hamid EM, Elbashir MK, Musa MEM, Wang J, et al.A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification. PLoS ONE 17(10): e0275195. https://doi.org/10.1371/journal.pone.0275195.(2022)
[27] Min, S., Lee, B. & Yoon, S.Brief. Bioinform. 18, (2016) 851–869 .
[28] Kandaswamy,K.K.,Pugalenthi,.,Kalies,K.U.,Hartmann,E.,Martinetz,T.,2013
[29] Saravanan, V. & Gautham, N. Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor. OMICS 19, (2015) 648–658 .
[30] V. Saravanan and N. Gautham, ‘‘Harnessing computational biology for exact linear B-Cell epitope prediction: A novel amino acid composition based feature descriptor,’’ OMICS, A J. Integrative Biol., vol. 19, no. 10, pp. (2015) 648–658,doi: 10.1089/omi.2015.0095.
[31] V. Saravanan and N. Gautham, ‘‘BCIgEPRED—A dual-layer approach for predicting linear IgE epitopes,’’ Mol. Biol., vol. 52, no. 2, (2018) pp. 285–293,doi: 10.1134/S0026893318020127.
[32] L. Zou, C. Nan, and F. Hu, ‘‘Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles,’’ Bioinformatics, vol. 29, no. 24, (2013) pp. 3135–3142,doi: 10.1093/bioinformatics/btt554
[33] Ghulam, A., Sikander, R., Ali, F., Swati, Z. N. K., Unar, A., & Talpur, D. B. (2022). Accurate prediction of immunoglobulin proteins using machine learning model. Informatics in Medicine Unlocked, 29, (2022) 100885.
[34] Sikander, R., Ghulam, A. & Ali, F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Sci Rep 12, 5505 (2022).
[35] Ghualm, Ali, et al. "Identification of Pathway-Specific Protein Domain by Incorporating Hyperparameter Optimization Based on 2D Convolutional Neural Network." IEEE Access 8 (2020) 180140-180155.
[36] Ghulam, A., M. Memon, M. Hyder, Z. A. Maher, A. Unar, Z. N. K. Swati, D. B. Talpur, R. Sikander, I. Ullah, and A. Farman. "Identification of Novel Protein Sequencing SARS CoV-2 Coronavirus Using Machine Learning." Bioscience Research (2021) 47-58.
[37] Sikander, R., Arif, M., Ghulam, A., Worachartcheewan, A., Thafar, M. A., & Habib, S. Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network. Frontiers in Genetics, 13(2022).

Most read articles by the same author(s)