UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL
Main Article Content
A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor. The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically. When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Papers accepted for publication become copyright of the Editorial Office, Journal of Mountain Area Research, Karakoram International University Gilgit, Gilgit-15100, Pakistan and authors will be asked to sign a copyright transfer agreement. Articles cannot be published until a signed copyright transfer agreement form has been received.
The work published in this journal is licensed under a Creative Commons Attribution 4.0 International License.
 Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005; 102(43):15280–2.
 Pickart CM, Eddins MJ. Ubiquitin: structures, functions, mechanisms. Biochim Biophys Acta. 2004; 1695(1–3):55–72.
 Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599–609.
 Peng JM, Schwartz D, Elias JE, Thoreen CC, Cheng DM, Marsischky G, et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol.21(8) (2003)921–6
 Herrmann J, Lerman LO, Lerman A. Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res.;100(9)(2007)1276–91.
 Welchman R, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599–609.
 Schwartz AL, Ciechanover A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu Rev Med.50 (1999) 57–74.
 Zhong J, Shaik S, Wan L, Tron AE, Wang Z, Sun L, Anushka H, Wei W.SCF beta-TRCP targets MTSS1 for ubiquitination-mediated destruction to regulate cancer cell proliferation and migration. Oncotarget. 4(12) ( 2013) 2339–53
 B. Yu, Z. Yu, C. Chen, A. Ma, B. Liu, B. Tian, Q. Ma, DNNAce: prediction of prokaryote lysine acetylation sites through deep neural networks with multi-information fusion, Chemomet. Intell. Lab. 200 (2020) 103999.
 G. Xu, J.S. Paige, S. R Jaffrey, Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling, Nat. Biotechnol. 28 (2010) 868–873.
 W. Kim, E.J. Bennett, E.L. Huttlin, A. Guo, J. Li, A. Possemato, M.E. Sowa, R. Rad, J. Rush, M.J. Comb, J.W. Harper, S.P. Gygi, Systematic and quantitative assessment of the ubiquitin-modified proteome, Mol. Cell. 44 (2011) 325–340.
 P. Radivojac, V. Vacic, C. Haynes, R.R. Cocklin, A. Mohan, J.W. Heyen, M. G. Goebl, L.M. Iakoucheva, Identification, analysis, and prediction of protein ubiquitination sites, Proteins 78 (2010) 365–380.
 Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite:incorporating two-layered machine learning method with substrate motifsto predict ubiquitin-conjugation site on lysines. BMC Syst Biol.10 (Suppl 1)(2016)6.
 Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A new scheme tocharacterize and identify protein ubiquitination sites. IEEE/ACM Trans Comput Biol Bioinform.14 (2) (2017)393–403.
 Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn.33(8) (2015)1731–42.
 Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and speciesspecific ubiquitin conjugation sites. Bioinformatics.29(13) (2013)1614–22.
 Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics. 33(5)(2017)661–8
 Yuan Y, Xun G, Jia K, Zhang A, Acm: a multi-view deep learning method for epileptic seizure detection using short-time Fourier transform; 2017.
 Yuan Y, Xun G, Jia K, Zhang A. A Novel Wavelet-based Model for EEG Epileptic Seizure Detection using Multi-context Learning. In: Hu XH, Shyu CR, Bromberg Y, Gao J, Gong Y, Korkin D, Yoo I, Zheng JH, editors. 2017 Ieee International Conference on Bioinformatics and Biomedicine; (2017).p. 694 –9.
 SAnchez, R. O. B. E. R. T. O., & Sali, A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proceedings of the National Academy of Sciences, 95(23), (1998) 13597-13602.
 Husnjak, K., & Dikic, I.Ubiquitin-binding proteins: decoders of ubiquitin-mediated cellular functions. Annual review of biochemistry, 81, (2012) 291-322.
 Agrahari, A. K., Bose, P., Jaiswal, M. K., Rajkhowa, S., Singh, A. S., Hotha, S. ... & Tiwari, V. K. Cu (I)-catalyzed click chemistry in glycoscience and their diverse applications. Chemical Reviews, 121(13),(2021) 7638-7956.
 Wang, M., Cui, X., Li, S., Yang, X., Ma, A., Zhang, Y., & Yu, B. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemometrics and Intelligent Laboratory Systems, 207,(2020) 104175.
 Liu, Y., Jin, S., Song, L., Han, Y., & Yu, B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. Journal of Molecular Graphics and Modelling, (2021) 107962.
 Alsanousi WA, Ahmed NY, Hamid EM, Elbashir MK, Musa MEM, Wang J, et al.A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification. PLoS ONE 17(10): e0275195. https://doi.org/10.1371/journal.pone.0275195.(2022)
 Min, S., Lee, B. & Yoon, S.Brief. Bioinform. 18, (2016) 851–869 .
 Saravanan, V. & Gautham, N. Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor. OMICS 19, (2015) 648–658 .
 V. Saravanan and N. Gautham, ‘‘Harnessing computational biology for exact linear B-Cell epitope prediction: A novel amino acid composition based feature descriptor,’’ OMICS, A J. Integrative Biol., vol. 19, no. 10, pp. (2015) 648–658,doi: 10.1089/omi.2015.0095.
 V. Saravanan and N. Gautham, ‘‘BCIgEPRED—A dual-layer approach for predicting linear IgE epitopes,’’ Mol. Biol., vol. 52, no. 2, (2018) pp. 285–293,doi: 10.1134/S0026893318020127.
 L. Zou, C. Nan, and F. Hu, ‘‘Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles,’’ Bioinformatics, vol. 29, no. 24, (2013) pp. 3135–3142,doi: 10.1093/bioinformatics/btt554
 Ghulam, A., Sikander, R., Ali, F., Swati, Z. N. K., Unar, A., & Talpur, D. B. (2022). Accurate prediction of immunoglobulin proteins using machine learning model. Informatics in Medicine Unlocked, 29, (2022) 100885.
 Sikander, R., Ghulam, A. & Ali, F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Sci Rep 12, 5505 (2022).
 Ghualm, Ali, et al. "Identification of Pathway-Specific Protein Domain by Incorporating Hyperparameter Optimization Based on 2D Convolutional Neural Network." IEEE Access 8 (2020) 180140-180155.
 Ghulam, A., M. Memon, M. Hyder, Z. A. Maher, A. Unar, Z. N. K. Swati, D. B. Talpur, R. Sikander, I. Ullah, and A. Farman. "Identification of Novel Protein Sequencing SARS CoV-2 Coronavirus Using Machine Learning." Bioscience Research (2021) 47-58.
 Sikander, R., Arif, M., Ghulam, A., Worachartcheewan, A., Thafar, M. A., & Habib, S. Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network. Frontiers in Genetics, 13(2022).