Sample Paper - Manuscript Preparation J. mt. area res., Vol. 8, 2023 14 Journal of Mountain Area Research UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL Rahu Sikander1*, Ali Ghulam2, Ali Farman3, Dhani Bux Talpur4, Mir Sajjad Hussain Talpur2, Erum Saba2, Zulfikar Ahmed Maher2, Saima Tunio2 1. School of Computer Science and Technology, Xidian University, Xi’an 710071, China 2. Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan 3. Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa, Pakistan 4. Department of Computer Science, University of Gwaddar, Gwaddar, Balochistan ABSTRACT A recent line of research has focused on Ubiquitination, a pervasive and proteasome -mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor. The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have be en studied experimentally and clinically. When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large -scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5 -fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification. KEYWORDS: Ubiquitin proteins; DDE; DPC; Ubipro-XGBoost; Machine learning *Corresponding author: (Email: sikander@stu.xidian.edu.cn) 1. INTRODUCTION This study used a combination of qualitative and quantitative analysis computational tool, and discovery of Ubiquitin [1] that ubiquitin is a tiny, 76-amino acid protein [2]. Protein ubiquitination is a common post-translational modification. It is a process that attaches ubiquitin, a protein, to the substrate. An increase in ubiquitin-protein levels can have a variety of effects on how a protein behaves. It can, for example, instruct the proteasome to digest proteins [3, 4]. Additionally, this process is connected to inflammation, cell change, Vol. 08, 2023 https://doi.org/10.53874/jmar.v8i0.167 Full length article Sikander et al., J. mt. area res. 08 (2023) 14-26 15 J. mt. area res., Vol. 8, 2023 and the immune response. [5]. A frequent post-translational modification is protein ubiquitination. It is a procedure that assigns the protein ubiquitin to the substrate. Numerous factors can change how a protein functions as a result of an increase in ubiquitin-protein levels. [6, 7]. Ubiquitination has been linked to cell change, immunological response, and inflammatory response [8]. A tiny regulatory protein called ubiquitin-protein is involved in the ubiquitination modification process and is present in practically all eukaryotic tissues. The three processes of ubiquitination are activation, binding, and connection [9]. Ubiquitination is critical to understanding protein regulation and molecular mechanisms and identifying potential ubiquitination sites is essential. It is critically needed to develop computational methods that can detect protein ubiquitination sites more quickly and precisely than traditional methods such as CHIP-CHIP analysis and mass spectrometry. The identification of protein ubiquitination sites can be done using computational approaches. A considerable amount [10] of research has focused on comprehending the mechanism of ubiquitination is the identification of ubiquitination sites. Ubiquitination is quick and reversible, though high-throughput mass spectrometry (MS) technology ubiquitin antibodies, and ubiquitin-binding proteins [11, 12], in combination with liquid chromatography and mass spectrometry [13], are examples of conventional experimental techniques. UbiProber was developed by Chen to combine sequencing information with physico-chemical parameters and amino acid composition in order to build generic models for eukaryotic proteomes and species-specific models for proteomes from a variety of different species. Physico-chemical features were added into SVM by ESA-UbiSite [14]. ESA was performed to choose the most effective negative dataset from the entire dataset, however. The large-scale protein ubiquitination site prediction, these existing machine learning algorithms perform well on small-scale data, but there are still significant obstacles. First, the artificially designed features have a weakness. There are currently no methods that do not rely on expert knowledge for feature extraction, which results in incomplete and biased feature vectors [15, 16]. Second, there is a variety in the features. To boost accuracy, most existing prediction methods converged on a single feature and ignored the inherent heterogeneity among them. Third, there is a disparity in the number of positive and negative samples. There are only a limited number of lysine residues that can be ubiquitinated in the entire proteome, making protein ubiquitination site prediction an extremely imbalanced problem [17]. Such an imbalanced situation does not lend itself well to existing approaches for discovering probable ubiquitination sites. It's thought that deep learning, a recent trend in machine learning for massive datasets, could be the answer to these issues. To successfully analyses genomic and proteomic data, a number of deep learning networks have been used It is yet to be used in the prediction of protein ubiquitination sites by deep learning techniques. To illustrate the roles of new molecules in huge signal networks, one can use this graphic to depict ubiquitin [18]. The distinctive patterns of molecules from a certain class can be shown by a large number of interconnected proteins. [19, 20]. Sikander et al., J. mt. area res. 08 (2023) 14-26 16 J. mt. area res., Vol. 8, 2023 XGBoost ubiquitin is based on the XGBoost algorithm for protein function. Machine learning approaches such as eXtreme Gradient Boosting have been used to predict protein structures in the literature (XGBoost) [21]. Networks that use low-level features as inputs produce high-level features at the next layer. Computer vision and natural language processing both use XGBoost -based techniques. Even in biomedical data analysis, XGBoost -based methods have been found to outperform standard predictive methods used in bioinformatics and chem informatics [22] because of recent advancements in processing power [23]. Ubiquitin prediction is an area where XGBoost ubiquitin performs exceptionally well. Other machine learning classifiers like deep neural network (DNN) AdaBoost (ABC) and Random Forest (RF) classifiers are also compared to this model's prediction performance. In order to find the optimum feature extraction approach, we also use feature extraction protocols that have been successful in tackling diverse biological challenges. We believe that DDE and DPC are the most effective approaches for extracting features from a dataset. Table 1. Collected Data as ubiquitin and non-ubiquitin sequences. An approach based on the previously mentioned machine learning classifiers is also on the table 1. 2. MATERIALS AND METHODS 2.1 Datasets Machine learning models can be simplified by employing a quantitative approach that includes the usage of a dataset. The UniPortKB and NCBI-databases are where we get our information. Eight hundred and twenty-five different protein sequences have been obtained, with the majority 375 being ubiquitin positive and the remainder 450 non-ubiquitin positive. This is a class of proteins used to model subcellular distributions [24]. We've gotten the info from the database above. The obtained datasets are preprocessed based on the protein–pathway and protein–non-pathway interactions. Data was stored in CSV format and the parameters of our suggested model were established. The sequence of an ubiquitination-precise protein were proposed as a positive test sample. Training datasets were imbalanced by a random selection of positive and negative samples [25]. An online database containing proteins from a variety of organisms was used in this study but only human-related proteins that were specifically implicated in human pathways were investigated more than 450 non-ubiquitin proteins were received as part of the CD-HIT [26], step for similarity measures. This preprocessing method resulted in the finalization of 775 proteins by removing redundant information. There was a reduction in redundancy, and 775 proteins were received, including 375 ubiquitin proteins and 450 non-ubiquitin proteins. Original data Similarity <30% Cross-validation Ubiquitin proteins 550 375 375 Non-Ubiquitin proteins 650 450 450 Total 1200 825 825 Sikander et al., J. mt. area res. 08 (2023) 14-26 17 J. mt. area res., Vol. 8, 2023 2.2 Features Extraction for Ubiquitin Protein Association The process of turning protein sequence information into numerical data, known as feature extraction, is crucial to the classification effort. In order to extract the information from protein sequences, sequence-based features, physicochemical property-based features, and evolutionary-derived features are chosen in this study. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. Second, we encoded features extraction model named dipeptide composition (DPC) model. A two-dimensional sparse matrices of size 20x20 was obtained and reduced to a one-dimensional vector. With this method of random projection, an effective measurement matrix was used to generate a small functionality set. Because of this, a new method of extracting compressive sensing functionality has been developed. The XGBoost, DDE, and DPC feature profiles were studied, and an essential approach for classifying pathway-specific proteins was devised. Data gathering, feature extraction, CNN development, and model evaluation are all part of the system. Figure 1 depicts our system's flowchart and provides the following explanation of its specifics. In order to detect and classify proteins that are peculiar to human pathways, a new technique was created. 2.3 Features Encoded By DDE We distinguish between a cell's ubiquitination and non- ubiquitination, feature extraction based on amino-acid combination is studied in relation to the (DDE). The primary formula used to determine a protein sequence's dipeptide combination (DC). We encoded physicochemical, evolutionary functions from the Ubiquitin datasets. It was shown that the DDE features profiles vector were more effective than other characteristic representations in boosting the specific linear proteins linked with pathogen protection. According to earlier studies, dipeptide frequency variations were measured using dipeptide composition features in this study. The theoretical mean (Tm), theoretical variance (Tv), and dipeptide composition (DDE) were used to build the DDE feature vector (Cc). It is determined as follows: the three parameters and DDE, and DC an indicator for (Cc) is supplied by DC (i). (i) ni D Nc  (1) It was possible to extract 400 dipeptide attributes (20 ordinary amino acids 20×20 dipeptide properties), although not all of them followed one another in any particular order. Dipeptide I and N are also not found in L-1 (i.e., potential quantity in P). The theoretical mean (TM (i)). 1 1 (i) C C i i T C CM N N   (2) Ci1 is the number of codons for the first amino acid, and Ci2 is the number of codons for the second amino acid, both for the dipeptide. J. mt. area res., Vol. 8, 2023 18 Figure 1. The XGBoost ubiquitin protein framework model has been proposed. Except for the three codons, the total number of codons is CN. In order to avoid having to recalculate the features of TM(i), only features with a length of 400 dipeptides were used. The theoretical variance of TV(i) is provided by dipeptide i. (1 ) (i) (i) v(i) T T M M T N   (3) This equation gives the theoretical average of the number j, or TM(j) (2). Again, and peptide P has the same number of L-1-dipeptides as before. Finally, DDE(i) is identified as (i) T (i) (t) (i) D c m DDE T V   (4) We calculate the 400-dimensional features vector was used to calculate DDE for each of the 400 dipeptide features. 𝐷𝐷𝐸𝑝 = (𝐷𝐷𝐸(𝑖) … . 𝐷𝐷𝐸𝑝), where 𝑖 = 1,2, … .4) (5) 2.4 Features encoded with DPC The dipeptide composition is present in the first two successive residues (DPC). Sequences are limited to 400 characters. For the most part, this sequence representation provides information on the amino acid composition and local order. The DPC feature extraction procedure was performed on this model in order to extract the best foundational features. When an amino acid occurs twice in a row in a protein sequence, it is referred to as a double-prefix codon (DPC). To give an example, in the series there are dipeptide frequencies for MALMAC (two), ALLM (one), AC (one), as well as CC (one). The total number of feature elements was 400 dipeptides. In order to standardize the DPC features, we divided the frequencies by (N-1) [27]. The frequency of two adjacent amino acids in a dipeptide captures new information about the amino acid makeup. Because of this, the dipeptide composition is ideal in situations requiring localized information, such as homologic information. # diseptide × 100 1 of j f j N   (6) 2.5 Proposed model We build a novel machine learning model for protein association prediction by using the Sikander et al., J. mt. area res. 08 (2023) 14-26 19 J. mt. area res., Vol. 8, 2023 XGBoost Ubiquitin Protein Sequence. A two features extractions technique is implemented in order to remove the unnecessary functionalities from the model before the model is constructed. Ubipro-XGBoost ubiquitin is then compared to two features encoding models, and the results are used as inputs to three machine learning classifiers. We can also develop hybrid features by combining various feature space combinations. For this purpose, 10-fold cross-validation tests are also carried out. As shown Figure 1 illustrates our proposed method framework. This section proposed a unique machine learning technique and feature extraction model for predicting ubiquitination sites. As shown in Figure 1 shows the suggested method Ubipro-XGBoost framework model. First step in the green box we collected from the mentioned databases, and then removed similarity redundancy, finalized the ubiquitin positive proteins datasets and ubiquitin negative proteins datasets. Second step in the blue box we extracted features by DDE and DPC model and then feature normalization. Third step in the brown box we proposed XGBoost algorithm for the classification on the basis of 10-fold cross-validation. To evaluate the classification model's ability to predict outcomes, by using 10-fold class validation technique. We than our proposed XGBoost algorithm performance with other three machine learning classifiers. According to simulation results, the proposed strategy performs reasonably well when compared to some cutting-edge techniques [28, 29]. An ensemble algorithm known as extreme gradient boosting (XGBoost) has recently been shown to produce more accurate energy models than artificial neural networks and degree-day ordinary least square regression by Chakraborty and Elzarka [30] [31]. 3. PERFORMANCE EVALUATION METHODS The training dataset is used to tune the parameters of the models using a tenfold cross-validation approach, and the independent set is used to test the model [32]. The underlying models have been evaluated using efficiency metrics such as sensitivity (sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC). In this study, true positive (TP), false positive (FP), false negative (FN), and true negative (TN) are the four units in the confusion matrix derived following prediction (TN). Sensitivity, specificity, precision, accuracy, F-score, and Matthew's correlation coefficient (MCC) were some of the metrics used to evaluate the overall prediction performance of different categorization models. Previous research have utilized them, with a greater value suggesting better performance (Jing and Dong, 2017). The following are some examples of performance metrics. TP Sensitivity = TP FN (7) TN Specificity = TN FP (8) TP TN Accuracy = TP FP TN FN     (9) TP* TN FP* FN MCC = ( TP FP )( TP FN )( TN FP )( TN FN )      (10) 4. RESULTS AND DISCUSSION We used positive samples (375 sequences) and negative samples (375 sequences) Sikander et al., J. mt. area res. 08 (2023) 14-26 20 J. mt. area res., Vol. 8, 2023 benchmark dataset. In the first phase, we had to compare various matrices in order to get the best DDE and DPC matrix for our model. In the end, we found that the DPC matrix (20x20) was the best one for solving the imbalance data. Next, we set into the XGBoost algorithm with all 400 retrieved feature sets. There are many ways in which experimentation might be developed. According to our two DPC models, we then employed the DDE model. 4.1 Ubiquitin and Non- Ubiquitin Sequence for the AAC The number of amino acids in ubiquitin and non-ubiquitin sequences was calculated in order to determine their composition. The 20 amino acids that contribute significantly to two datasets. There are few notable exceptions to the general rule that there are no significant differences between the two categories of data. The highest concentrations of C and P amino acids can be found throughout proteins. So, the finding of ubiquitin proteins in these amino acids is crucial. In light of the various properties of these amino acids, our model is able to Accurately predict ubiquitin proteins as shown Table 2. Table 2. Metric Performance obtain by XGBoost. 4.2 Ubiquitin between XGBoost and Shallow Machine Learning with a Comparable Efficiency According to this finding, multiple machine learning algorithms were tested in order to identify proteins derived from Ubiquitin. We employee four machine learning classifiers were used in our study (e.g., AdaBoost [33], Random-Forest, and DNN). Our XGBoost[34] was compared to the DNN Deep Neural Network's implementation of perceptions, and the results were compared to our XGBoost. Table 3 and Figure 2 shows that we used the optimum parameters in all of our trials so that we could compare each classifier to the others. We found that our XGBoost performed better than other standard machine learning approaches in the same experiment framework. Our Ubipro-XGBoost, in particular, created algorithms based on a distinct dataset. Table 3. Performance Comparison ML classifiers by DDE model ML-Classifier ACC Precision Sensitivity Specificity MCC F1 RF -DPC 0.779% 0.769% 0.783% 0.775% 0.570% 0.758% DNN -DPC 0.841% 0.838% 0.831% 0.852% 0.705% 0.798% AdaBoost -DPC 0.901% 0.941% 0.852% 0.950% 0.821% 0.861% ACC Prc Sens Spec Mcc F1 RF-DDE 0.752 0.759 0.767 0.738 0.519 0.742 DNN-DDE 0.827 0.870 0.805 0.849 0.688 0.779 AdaBoost- DDE 0.878 0.874 0.844 0.912 0.767 0.832 J. mt. area res., Vol. 8, 2023 21 Figure. 2. Proposed model compare with other classifiers Table 4. Performance Comparison other three ML classifiers by DPC model Table.4 shows how the XGBoost hybrid features can be used to demonstrate the classifier's predictive power. The Random Forest classification, on the other hand, performed very well in this mixed-feature comparison. This model classification was more accurate than XGBoost, which predicted Random Forest model classifications with 93.53% accuracy using XGBoost data. Table 4 and Figure 3 shows the results of a comparison with three MLCs. 4.3 ROC (Auc) Comparative Performance by DDE and DPC The results analysis consists of prior investigations into the binary classification issue that we used in our study. Our results were discovered to be accurate and consistent with the majority of machine learning classification algorithms. Researchers also employ other metrics in the ROC curve plot and the ROC (AUC), such as the algorithm's accuracy or the confusion matrix. Machine Learning Classifier ACC Precision Sensitivity Specificity MCC F1 RF -DPC 0.779% 0.769% 0.783% 0.775% 0.570% 0.758% DNN -DPC 0.841% 0.838% 0.831% 0.852% 0.705% 0.798% AdaBoost -DPC 0.901% 0.941% 0.852% 0.950% 0.821% 0.861% J. mt. area res., Vol. 8, 2023 22 Figure. 3. Proposed model compare with other classifiers Figures 4 show the results of classifying the Ubipro-XGBoost output using the ROC AUC curve. The Ubipro-XGBoost Ubiquitin multilink ROC auc curve is shown. It appears that our Ubipro-XGBoost model perform well even with multi-classification, however more data were needed to investigate this discovery in more depth. There were no over fitting issues with our suggested cross-validation Ubipro-XGBoost model, which had an accuracy rate of 0.914% percent. Figure 4. ROC (AUC) with DDE and ROC-Auc with DPC Model ROC and ACU scores of DPC model cross validation datasets were found to be 0.94% percent, while RCO-AUC scores of with DDE model datasets were found to be 0.94% percent. 4.4 ROC (Auc) Score Comparison with other 3 three Classifiers by Using DDE and DPC As can be seen, Ubipro-XGBoost performs better than the alternatives. We calculated ROC (AUC) score comparison for several machine learning approaches as shown in figure 5. As can be observed each methods prediction rate is considerably higher than random prediction. Additionally, the XGBoost classifier performs better than the others. Ubipro-XGBoost ubiquitin identification as shown Figure 5 distinct ubiquitin datasets are represented by different ROC–AUC curves score. DDE model achieved performance such as AdaBoost ROC (AUC) generate 0.92%, RF ROC (AUC) generate 0.86%, DNN ROC (AUC) generate 0.85%, and our Sikander et al., J. mt. area res. 08 (2023) 14-26 23 J. mt. area res., Vol. 8, 2023 proposed model XGBoost ROC (AUC) generate 0.94%, which is better than other classifiers. DPC model achieved performance such as AdaBoost ROC (AUC) generate 0.93%, RF ROC (AUC) generate 0.88%, DNN ROC (AUC) generate 0.89%, and our proposed model XGBoost ROC (AUC) generate 0.94%, which is better than other classifiers. Figure. 5. ROC curves of the comparison performance with DDE and DPC methods. 5. DISCUSSION The Ubipro-XGBoost predictor is trained on the most comprehensive database of protein Ubiquitin modifications. Using a machine learning classification model, an XGBoost is used to predict ubiquitination. First, ubiquitination is predicted using the machine learning classification models. The best result for the XGBoost classification model accuracy, 0.836%. Then a DPC precision score 0.892% was achieved in the XGBoost model and DDE precision score 0.881% was achieved in the XGBoost model, and the accuracy score was achieved with the XGBoost model, which indicates that our test overall XGBoost classification is initially and then secondly the AdaBoost classification model according to our experimental tests. DNN Analysis, the third-best classifier with ROC (AUC) on the DPC [35] and DDE model [36] was achieved in addition to the highest analyses as shown in Figure 5. Ubiquitin proteins [37]. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. CONCLUSION The XGBoost algorithm was used to produce Ubipro-XGBoost, a predictor for the correct identification of Ubiquitin proteins. As compared to earlier predictors, we have attained state-of-the-art performance on the benchmark dataset. It is possible to infer three main conclusions. To begin, the XGBoost algorithm consistently and accurately predicts Ubiquitin levels when compared to other algorithms. To further enhance model performance, the DDE and DPC feature selection method was used to optimize feature vectors, which extracted the most significant features from a huge number of candidates features and increased the model's accuracy. This is a significant advantage over other sequence-based Ubiquitin predictors, which are limited in their ability to provide relevant explanations for samples provided using the SHAP technique. DPC features contributed to the final prediction direction, which is explained here. Also explained is the importance of paying attention to some specific identities, as well as a range of other traits. The end results demonstrated that Ubipro-XGBoost obtained a satisfactory and promising performance, which is steady and credible. There are still unknowns about Sikander et al., J. mt. area res. 08 (2023) 14-26 24 J. mt. area res., Vol. 8, 2023 Ubiquitin, such as how many of them there are and what they do. This limits the accuracy of the model. In addition, it is necessary to investigate some possible connections among the features. Ubiquitin and non-Ubiquitin will be separated in the future by finding and extracting as many features as possible from a vast amount of data. DECLARATIONS Funding: No funding was received for this study. Conflicts of interest/Competing interests: The authors declare no any conflict of interest/competing interests. Data availability: Not applicable. Code availability: Not applicable. Authors’ contributions: Rahu Sikander, Ali Ghulam and Farman Ali jointly contributed to the design of the study. Rahu Sikander conceptualized the review and finalized the manuscript. Ali Ghulam and Dhani Bux Talpur wrote the initial manuscript. Farman Ali helped to draft the manuscript. Ashfaq Ahmed revised the manuscript and Rahu Sikander polished the expression of English. All of the authors have read and approved the final manuscript. REFERENCES [1] Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A.72(1)(1975)11–5 [2] Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005; 102(43):15280–2. [3] Pickart CM, Eddins MJ. Ubiquitin: structures, functions, mechanisms. Biochim Biophys Acta. 2004; 1695(1–3):55–72. [4] Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599– 609. [5] Peng JM, Schwartz D, Elias JE, Thoreen CC, Cheng DM, Marsischky G, et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol.21(8) (2003)921– 6 [6] Herrmann J, Lerman LO, Lerman A. Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res.;100(9)(2007)1276–91. [7] Welchman R, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol.6 (8)(2005)599– 609. [8] Schwartz AL, Ciechanover A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu Rev Med.50 (1999) 57–74. [9] Zhong J, Shaik S, Wan L, Tron AE, Wang Z, Sun L, Anushka H, Wei W.SCF beta-TRCP targets MTSS1 for ubiquitination-mediated destruction to regulate cancer cell proliferation and migration. Oncotarget. 4(12) ( 2013) 2339–53 [10] B. Yu, Z. Yu, C. Chen, A. Ma, B. Liu, B. Tian, Q. Ma, DNNAce: prediction of prokaryote lysine acetylation sites through deep neural networks with multi-information fusion, Chemomet. Intell. Lab. 200 (2020) 103999. [11] G. Xu, J.S. Paige, S. R Jaffrey, Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling, Nat. Biotechnol. 28 (2010) 868–873. [12] W. Kim, E.J. Bennett, E.L. Huttlin, A. Guo, J. Li, A. Possemato, M.E. Sowa, R. Rad, J. Rush, M.J. Comb, J.W. Harper, S.P. Gygi, Systematic and quantitative assessment of the ubiquitin-modified proteome, Mol. Cell. 44 (2011) 325–340. Sikander et al., J. mt. area res. 08 (2023) 14-26 25 J. mt. area res., Vol. 8, 2023 [13] P. Radivojac, V. Vacic, C. Haynes, R.R. Cocklin, A. Mohan, J.W. Heyen, M. G. Goebl, L.M. Iakoucheva, Identification, analysis, and prediction of protein ubiquitination sites, Proteins 78 (2010) 365–380. [14] Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite:incorporating two-layered machine learning method with substrate motifsto predict ubiquitin-conjugation site on lysines. BMC Syst Biol.10 (Suppl 1)(2016)6. [15] Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A new scheme tocharacterize and identify protein ubiquitination sites. IEEE/ACM Trans Comput Biol Bioinform.14 (2) (2017)393–403. [16] Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn.33(8) (2015)1731–42. [17] Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and speciesspecific ubiquitin conjugation sites. Bioinformatics.29(13) (2013)1614–22. [18] Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics. 33(5)(2017)661–8 [19] Yuan Y, Xun G, Jia K, Zhang A, Acm: a multi-view deep learning method for epileptic seizure detection using short-time Fourier transform; 2017. [20] Yuan Y, Xun G, Jia K, Zhang A. A Novel Wavelet-based Model for EEG Epileptic Seizure Detection using Multi-context Learning. In: Hu XH, Shyu CR, Bromberg Y, Gao J, Gong Y, Korkin D, Yoo I, Zheng JH, editors. 2017 Ieee International Conference on Bioinformatics and Biomedicine; (2017).p. 694 –9. [21] SAnchez, R. O. B. E. R. T. O., & Sali, A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proceedings of the National Academy of Sciences, 95(23), (1998) 13597-13602. [22] Husnjak, K., & Dikic, I.Ubiquitin-binding proteins: decoders of ubiquitin-mediated cellular functions. Annual review of biochemistry, 81, (2012) 291-322. [23] Agrahari, A. K., Bose, P., Jaiswal, M. K., Rajkhowa, S., Singh, A. S., Hotha, S. ... & Tiwari, V. K. Cu (I)-catalyzed click chemistry in glycoscience and their diverse applications. Chemical Reviews, 121(13),(2021) 7638-7956. [24] Wang, M., Cui, X., Li, S., Yang, X., Ma, A., Zhang, Y., & Yu, B. DeepMal: Accurate prediction of protein malonylation sites by deep neural networks. Chemometrics and Intelligent Laboratory Systems, 207,(2020) 104175. [25] Liu, Y., Jin, S., Song, L., Han, Y., & Yu, B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. Journal of Molecular Graphics and Modelling, (2021) 107962. [26] Alsanousi WA, Ahmed NY, Hamid EM, Elbashir MK, Musa MEM, Wang J, et al.A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification. PLoS ONE 17(10): e0275195. https://doi.org/10.1371/journal.pone.0275195.( 2022) [27] Min, S., Lee, B. & Yoon, S.Brief. Bioinform. 18, (2016) 851–869 . [28] Kandaswamy,K.K.,Pugalenthi,.,Kalies,K.U.,Hart mann,E.,Martinetz,T.,2013 [29] Saravanan, V. & Gautham, N. Harnessing computational biology for exact linear B-cell Sikander et al., J. mt. area res. 08 (2023) 14-26 26 J. mt. area res., Vol. 8, 2023 epitope prediction: A novel amino acid composition-based feature descriptor. OMICS 19, (2015) 648–658 . [30] V. Saravanan and N. Gautham, ‘‘Harnessing computational biology for exact linear B-Cell epitope prediction: A novel amino acid composition based feature descriptor,’’ OMICS, A J. Integrative Biol., vol. 19, no. 10, pp. (2015) 648–658,doi: 10.1089/omi.2015.0095. [31] V. Saravanan and N. Gautham, ‘‘BCIgEPRED—A dual-layer approach for predicting linear IgE epitopes,’’ Mol. Biol., vol. 52, no. 2, (2018) pp. 285–293,doi: 10.1134/S0026893318020127. [32] L. Zou, C. Nan, and F. Hu, ‘‘Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles,’’ Bioinformatics, vol. 29, no. 24, (2013) pp. 3135–3142,doi: 10.1093/bioinformatics/btt554 [33] Ghulam, A., Sikander, R., Ali, F., Swati, Z. N. K., Unar, A., & Talpur, D. B. (2022). Accurate prediction of immunoglobulin proteins using machine learning model. Informatics in Medicine Unlocked, 29, (2022) 100885. [34] Sikander, R., Ghulam, A. & Ali, F. XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set. Sci Rep 12, 5505 (2022). [35] Ghualm, Ali, et al. "Identification of Pathway-Specific Protein Domain by Incorporating Hyperparameter Optimization Based on 2D Convolutional Neural Network." IEEE Access 8 (2020) 180140-180155. [36] Ghulam, A., M. Memon, M. Hyder, Z. A. Maher, A. Unar, Z. N. K. Swati, D. B. Talpur, R. Sikander, I. Ullah, and A. Farman. "Identification of Novel Protein Sequencing SARS CoV-2 Coronavirus Using Machine Learning." Bioscience Research (2021) 47-58. [37] Sikander, R., Arif, M., Ghulam, A., Worachartcheewan, A., Thafar, M. A., & Habib, S. Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network. Frontiers in Genetics, 13(2022). Received: 19 Sep. 2022. Revised/Accepted: 10 Nov. 2022. This work is licensed under a Creative Commons Attribution 4.0 International License. http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/