ProAll-D: protein allergen detection using long short term memory - a deep learning approach


doi: https://doi.org/10.5599/admet.1335   231 

ADMET & DMPK 10(3) (2022) 231-240; doi: https://doi.org/10.5599/admet.1335  

 
Open Access : ISSN : 1848-7718  

http://www.pub.iapchem.org/ojs/index.php/admet/index   

Original scientific paper 

ProAll-D: protein allergen detection using long short term 
memory - a deep learning approach  

Pallavi M. Shanthappa*, Rakshitha Kumar* 

Department of Computer Science, Amrita School of Arts and Sciences, Mysuru Campus, Amrita Vishwa Vidyapeetham, 
India 

*Corresponding Authors:  E-mail: palls.ms@gmail.com;  rakshitha.k.k1999@gmail.com.  

Received: April 03, 2022; Revised: July 15, 2022; Available online: August 21, 2022 

 
Abstract 

Background: An allergic reaction is the immune system's overreacting to a previously encountered, typically 
benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane 
swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles 
and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive 
value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict 
possible allergens. Method: This work advocated the use of a deep learning model LSTM (Long Short-Term 
Memory) to overcome the limitations of traditional approaches and machine learning lower performance 
models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, 
from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was 
divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To 
describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic 
character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 
(propensity of beta strands) are used to make the variable-length protein sequence to uniform length using 
ACC transformation. A total of eight machine learning techniques have been taken into consideration. 
Results: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging 
Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic 
Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %. Conclusion: As the 
LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has 
been created that successfully identifies novel allergens using the LSTM approach. Users can use the link 
https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data. 

©2022 by the authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons 
Attribution license (http://creativecommons.org/licenses/by/4.0/). 

Keywords 

Allergen prediction; ACC transformation; LSTM model; Gaussian naive bayes; Classifier; Extra tree classifier; 

Bagging classifier; ADA boost;  Linear discriminant analysis;  Quadratic discriminant analysis 

 
Introduction 

Allergy, often described as an autoimmune disorder, is a clinical condition characterized by the immune 

system’s sensitivity to normally innocuous elements. The substance that causes allergy is known as an 

allergen. Allergens can be dust, pollen, cosmetics, and food. In food, allergy is usually caused by proteins. 

Proteins are an essential part of our diet, but some proteins can also be harmful to some individuals. One of 

https://doi.org/10.5599/admet.1335
https://doi.org/10.5599/admet.1335
http://www.pub.iapchem.org/ojs/index.php/admet/index
mailto:palls.ms@gmail.com
mailto:rakshitha.k.k1999@gmail.com
https://doi.org/10.17632/tjmt97xpjf.1
http://creativecommons.org/licenses/by/4.0/


Pallavi and Rakshitha   ADMET & DMPK 10(3) (2022) 231-240 

232  

the reasons behind this is that nowadays, the use of genetically modified crops that are transgenic food crops 

is increasing rapidly. Thus, it is necessary to assess them before they are introduced into the food chain. 

Allergy can be innate, acquired, predictable, and at times rapid. Allergic reactions are caused by an antibody 

called immunoglobulin E (IgE), which causes hyperactivity in white blood cells such as mast cells and 

basophils, resulting in the production of inflammatory chemicals like histamine. Apart from symptoms such 

as uneasiness, sneezing, wheezing, and swelling, allergic reactions can also lead to life-threatening situations. 

As a result, assessing them is critical to protect society's wellbeing. 

According to the Food and Agriculture Organization, a protein is a potential allergen if it has a homology 

of six successive amino acids or a sequence identity of more than 35 percent [1]. Poms et al. developed PCR 

(Polymerase Chain Reaction) in conjunction with ELISA (Enzyme-Linked Immunosorbent Assay) to find 

potential allergens in foods or an indicator to detect the existence of the offending foods [2]. AlgPred uses 

MEME/MAST motif search to predict allergens and SVM for classification based on single and dipeptide 

composition [3]. AllerHunter uses SVM as the classifying method and an incremental pairwise sequence 

comparison indexing approach to identify probable allergens and allergic cross-reactivity in proteins. The 

paired vectorization system models the essential elements of allergens that are involved in cross-reactivity 

[4]. Using Pseudo-Amino Acid Composition (PseAAC) and SVM, a new technique for identifying and predicting 

allergenic proteins was developed. It looked at sequence vector representations derived from sequence 

attributes. The minimum Reliability and Maximal Significance feature selection approach were used to assess 

the impact and efficiency of each feature [5]. Vijaykumar et al. developed an innovative fuzzy rule-based 

approach to investigate protein allergenicity when the similarity between known allergens and non-allergens 

is low for characterizing allergens. The results of five different modules were combined: computational 

classifier, pattern analysis, global comparison with allergens, FAO management framework, and prototype 

approach [6].  

AllerTop was developed as an alignment-free allergen prediction method. Protein properties were defined 

using Z-descriptors. The ACC transformation was used to convert the variable-length strings to uniform-

length strings. The KNN (K-Nearest Neighbor) algorithm was applied for classification and consistently 

outperformed other algorithms [7]. AllergenFP- was designed for distinguishing allergens and non-allergens, 

and a sequence descriptor-based fingerprint technology was presented. The strings of varying lengths were 

transformed into arrays of similar lengths using the ACC transformation. The results were compared using 

Tanimoto coefficients followed by the transformation of vectors to binary fingerprints [8]. For allergenicity 

prediction, Dimitrov et al. developed Artificial Neural Network-based algorithms. As a final step before the 

ANN modeling, the vectors were transformed into binary fingerprints [9]. AllerTop v2 is a highly accurate 

allergen prediction model based on amino acid characteristics. The ACC procedure was used to transform 

variable-length strings into uniform-length vectors. In comparison to other classification approaches, the KNN 

algorithm produced a stable output graph [10]. Allerdictor is a pattern-based allergy prediction software that 

interprets sequence data as a textual information and detects allergens through text classification using 

support vector machines [11]. Cross-React was a computational framework approach for predicting 

allergenic protein’s cross-reactivity. It is based on the hypothesis that surface regions with peptide 

compositions similar to an antigen in a known allergen can be detected on three-dimensional structures of 

probable allergens [12].  

A study of GE (Genetically Engineered) crops found they are as safe as conventional food crops. Screening 

the recombinant protein for predicting potential allergens is one of the assessment procedures for GE crops. 

It implies that there is currently no clear parameter that can be used to anticipate the pathogenicity of 

proteins [13]. AllerCatPro analyses potential allergenic protein based on the three-dimensional structural 


ADMET & DMPK 10(3) (2022) 231-240 ProAll-D: protein allergen detection 

doi: https://doi.org/10.5599/admet.1335 233 

similarity. Shifting between sequential frame similarities to B-cell epitope-like 3D surface similarity with 

anticipated architectures was investigated, and so an entropy-adjusted hexamer hit method was also 

investigated [14]. Pallavi et al. used computational analysis to compare three medications and identified the 

impediment to malignant cells that cause skin cancer. They also used homology modeling to create the 3D 

framework of a BRAF(V600E) protein genotype, which they confirmed using the Ramachandran plot [15]. 

Aller Screener predicts protein allergenicity by analyzing HLA binders derived from recognized allergens. By 

creating binders to HLA class II proteins, it could predict whether a given substance is safe to eat or drink 

[16]. AlgPred v 2.0 provides various options, including searching for motifs in proteins found by MEME/MAST 

and MERCI features such as BLAST-based similarity searches and IgE epitope mapping [17]. Wang et al. 

demonstrated the superiority of their proposed technique by using numerous supervised algorithms as 

baseline classifiers. The greatest AUC value was 0.9578 for the deep learning model, which was superior to 

the ensemble learning and baseline approaches, according to the results of 5-fold cross-validation [18]. 

This paper comprises the report on the development methods of a set of novel allergen prediction models 

that use the knowledge gained via a publicly available server: AllerTOP v.2 [10]. We propose the Long Short-

Term Memory (LSTM) as a rapid model-based recurrent neural network for protein allergen detection. LSTM 

is primarily used to handle long-term dependence problems and is best suited for time series or sequential 

data. Because our dataset, the protein dataset, is similarly in sequential form, LSTM would be a better 

method for classification with enhanced performance. Also, a few machine learning and ensemble learning 

algorithms have been evaluated for comparison and classification purposes. 

Methods 

Protein data sets 

We gathered a total of 2,427 allergens and 2,427 non-allergens from a variety of sources, including the 

Central Science Laboratory and the National Center for Biotechnology Information (NCBI). The redundancies 

were eliminated. A sample protein sequence is shown in Figure 1. 

>gi│83715928│dbj│BAE54429.1│ tropomyosin [Sepia esculenta] 
MDAIKKKMLAMKMEKEVATDKAEQTEQSLRDLEDAKNKTEEDLSTLQKKYSNLENDFDNA 

NEQLTAANTNLEASEKRVAECESEIQGLNRRIQLLEEDLERSEERLTSAQSKLEDASKAA 
DESERGRKVLENRSQGDEERIDLLEKQLEEAKWIAEDADRKFDEAARKLAITEVDLERAE 

Figure 1. Sample dataset in Fasta Format of a protein sequence 

E-descriptors 

Five E-descriptors were used to characterize the protein sequences of allergens and non-allergens [10]. 

Venkatarajan et al. used principal component analysis to calculate the quantitative descriptor values based 

on 237 physical-chemical properties of amino acids. PCA of amino acid properties extracted 5 orthogonal  E-

descriptors, which are the eigenvectors of the covariance matrix [19]. We have considered the same five E-

descriptors that were used to define the characteristics of amino acids [10]. E1 denotes the hydrophilic nature 

of peptides, E2 their length, E3 their tendency for helical formation, E4 their abundance and distribution, and 

E5 their tendency for β strand formation.  

Auto cross-covariance transformation 

Proteins are composed of amino acid sequences, each distinct and varies in length. So, ACC transformation 

was employed to convert the variable-length sequence to uniform length so that the classification algorithms 

could be applied to it. Here, the 5 E-Descriptors have been considered, which were derived from 237 physio-

chemical properties of amino acids as used by Dimitrov et al. [10]. 

https://doi.org/10.5599/admet.1335


Pallavi and Rakshitha   ADMET & DMPK 10(3) (2022) 231-240 

234  

Auto Cross Covariance includes Auto Covariance and Cross Covariance. The equation to calculate Auto 

covariance is as follows: 

j,i j,i+lag

jj

i

( )

n lag E E
ACC lag

n lag

 



   (1) 

The equation to calculate Cross covariance is as follows: 

j,i k,i+lag

jk

i

( )
j k

n lag E E
ACC lag

n lag


 



  (2) 

Index j was used for the E-descriptors (j = 1, 2, 3,4,5), n is the number of amino acids in a sequence, the 

index i is the amino acid position (i = 1, 2, ...n) and l is the lag, length of the minimum sequence (l = 1, 2, ...L). 

In order to investigate the influence of close amino acid proximity on protein allergenicity, a short range of 

lags (L = 1, 2, 3, 4, 5) was used [20].            

The classification methods adopted in this research 

For classification, a few machine learning, ensemble learning, and deep learning algorithms have been 

considered, which include: The Gaussian Naive Bayes, Radius Neighbour's Classifier, Bagging Classifier, ADA 

Boost, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Extra Tree Classifier, and LSTM, which 

have been implemented in Python.  

Hochreiter and Schmidhuber proposed the term LSTM (Long Short-Term Memory) in 1997. LSTMs are a 

kind of RNN (Recurrent Neural Networks) that aid in the resolution of the long-term dependence problem. A 

conventional RNN encounters the difficulty of vanishing gradient problems, which makes learning extended 

sequences difficult. This is where LSTM comes in to address the challenge described above. The LSTM model 

is built iteratively. To construct the model, one must add different sorts of layers with varied parameters and 

experiment with dropout layers. The network constructed here consists of four layers with Three relu 

activation functions and one SoftMax function. "Categorical cross-entropy," was considered as the loss 

function with "rmsprop" as the optimizer. 

Evaluation of performance 

For training and testing purposes, the data was split in the ratio 80:20. The accuracy results of the model 

based on training and testing data has been represented in Table 2. True positives (TP) and true negatives 

(TN) were assigned to the allergens and non-allergens that were accurately predicted. False negatives (FN) 

and false positives (FP) were assigned to the allergen and non-allergen identified inaccurately. Precision [(TP) 

/(TP + FP)] is the fraction of correctly predicted samples among the retrieved instances. The Recall [(TP)/(TP 

+ FN)] is the ratio of accurately identified true positives to total actual true positives. And the F1 score is 

computed as [(2 *(Accuracy * Recall)/(Precision + Recall)]. 

Web server for allergenicity prediction 

A web server, namely ProAll-D, has been developed to predict the potential allergens using the LSTM 

algorithm. It is developed using the Python Django framework, which is fast and user-friendly. The detailed 

functioning of the webserver has been described in the supplementary section. 

Results and discussion 

ACC includes both autocovariance and cross-covariance. Auto covariance is calculated between the same 

E Descriptors, that is between E1 and E1, along with the lag value. AC111 represents the autocovariance 


ADMET & DMPK 10(3) (2022) 231-240 ProAll-D: protein allergen detection 

doi: https://doi.org/10.5599/admet.1335 235 

between E1 and E1 along with the lag value as shown in Figure 2. Cross covariance is calculated between the 

different E Descriptor values, like between E1 and E2, along with the lag value. The cross-covariance values 

will be represented as AC121, AC131, AC145, and AC431. The details of ACC implementation have been 

specified in the supplementary data. 

 
Figure 2. Output of ACC transformation  

 
The classification methods considered in this research are: The Gaussian Naive Bayes, Radius Neighbour's 

Classifier, Bagging Classifier, ADA Boost, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Extra 

Tree Classifier, and LSTM, which have been implemented in Python. 

Gaussian Naive Bayes is a statistical predictive model for classification that is based on the Naive Bayes 

algorithm. For our dataset, this algorithm produced an accuracy of 64.14 percent (Table 1). The Radius 

Neighbours Classifier is an extended version of the KNN algorithm that produces results using all instances 

within a range of a new instance rather than the k clusters, which would be beneficial for our dataset, but 

the model failed to provide the expected results, resulting in an accuracy of 49.2 percent. AdaBoost, also 

known as Adaptive Boosting, is an Ensemble Method. The weights are reallocated to each instance, with 

larger weights applied to inaccurately identified instances. Boosting is used in supervised learning to 

minimize bias as well as variation. The accuracy of this model was 76.9 percent. 

Linear and Quadratic Discriminant Analysis resulted in an accuracy of 76.13 and 84.2 percent. A Bagging 

classifier is an ensemble classifier that works on random samples of the data and then combines various 

instances to obtain the final output. This method resulted in an accuracy of 85.8 percent. Extra tree classifier, 

an extended version of the random forest algorithm, resulted in an accuracy of 90 percent. LSTM model 

resulted in an accuracy of 91.5 percent. 

Table 1. Analysis of the effectiveness of different classifiers              

Method Accuracy Precision Recall F1-Score 

Gaussian Naive Bayes 64.14 0.74 0.46 0.56 

Radius Neighbours Classifier 49.2 0.49 1.00 0.66 

 ADA Boost 76.9 0.79 0.75 0.77 

Linear Discriminant Analysis 76.13 0.80 0.71 0.75 

Quadratic Discriminant Analysis 84.2 0.93 0.74 0.83 

Bagging Classifier 85.8 0.89 0.86 0.88 

Extra Tree Classifier 90 0.95 0.85 0.90 

LSTM (Long Short-Term Memory) 91.5 0.91 0.91 0.91 

 
Table 1 represents the results of performance evaluation metrics and accuracy of all the algorithms 

implemented. It is evident that the LSTM approach is superior and has consistent performance across all 

measures. So, LSTM has been considered for protein allergen prediction. 

https://doi.org/10.5599/admet.1335


Pallavi and Rakshitha   ADMET & DMPK 10(3) (2022) 231-240 

236  

Table 2. Analysis of the classification results of different classifiers based on 
training and testing data. 

 Method Training data Testing data 

Gaussian Naive Bayes 63.8 59.7 

Radius Neighbours Classifier 50.5 47.6 

 ADA Boost 84.6 78.1 

Linear Discriminant Analysis 78.6 74.9 

Quadratic Discriminant Analysis 88.7 81.6 

Bagging Classifier 88.4 86.3 

Extra Tree Classifier 93.4 89.8 

LSTM (Long Short-Term Memory) 94.1 91.5 

Table 2 represents the accuracy of the algorithms for training and testing dataset. The LSTM method was 

implemented to a well-known benchmark data set for protein allergen identification, in which a protein has 

to be classified as allergen or non-allergen. LSTM delivers highly defined classification performance that is 

substantially quicker than other algorithms with comparable classification performance. LSTM is five times 

faster than marginal classification algorithms (methods based on distance) and two times faster than the 

quickest SVM-based methods (which have lower classification performance than LSTM). All the implemented 

methods were tested and compared using performance evaluation measures. The top-performing model was 

LSTM, which had an accuracy of 91.51 percent. LSTM is a more sophisticated version of the RNN (recurrent 

neural network). The LSTM has been considered for our problem because of its robustness against long-term 

dependency problems. Since the protein sequences are also correlated with each other, LSTM would be a 

likely method for solving long-term dependencies and would overcome the drawbacks of the alignment 

method.    

Table 3. Assessment of web servers for 
allergenicity prediction 

In Table 3 the performance of the LSTM model was compared 

to nine freely available servers. The LSTM resulted in an accuracy 

of 91.5 percent. 

A ROC curve (short for receiver operating characteristic) plots 

the rate of true positives vs. false positives to assess the 

effectiveness of a classification model. AUC measures how well a 

model distinguishes between positive and negative classes. AUC is 

not affected by the classification threshold value. Modifying the 

threshold value does not affect AUC because it is an aggregate 

measure of ROC. 

    
 Figure 3 indicates that when the true positive rate increases, so does the false positive rate and after a 

certain point that is near 0.91 the graph is constant. The area under the ROC curve between (0,0) and (1,1) is 

defined as the area under the curve (AUC). AUC essentially aggregates the model's performance overall 

threshold values. 

The LSTM model has the highest AUC, indicating that it has the largest area under the curve and is the 

best model for correctly classifying observations. The software for the server ProAll-D has been developed to 

predict potential allergens using the LSTM algorithm. It is developed using the Python Django framework, 

which is fast and user-friendly.  

Server Accuracy 

Allerhunter 0.871 

AlgPred (SVM_single_aa) 0.775 

AlgPred (SVM_dipeptide) 0.796 

AlgPred(ARP) 0.842 

APPEL 0.783 

ProAp(motif) 0.505 

ProAp(SVM) 0.843 

AllerTop v.1 0.828 

AllergenFP 0.879 

AllerTop v.2 0.887 

LSTM model 0.915 


ADMET & DMPK 10(3) (2022) 231-240 ProAll-D: protein allergen detection 

doi: https://doi.org/10.5599/admet.1335 237 

                                               
Figure 3. The LSTM model’s ROC curve  

There are three different sections, namely Home, Datasets, and Method Description as shown in Figure 

4. In the home section, the user enters the protein sequence in a one-letter code, and the models predict 

whether the entered sequence is allergenic or non-allergenic as shown in Figure 5. In the dataset part, we 

have uploaded the data considered in our research in the Fasta file format as represented in Figure 6. The 

Method description provides the user with a brief description of the methodologies we have considered. 

 
Figure 4. Interface of ProAll-D for protein 
allergen detection.                      

 
https://doi.org/10.5599/admet.1335


Pallavi and Rakshitha   ADMET & DMPK 10(3) (2022) 231-240 

238  

 
Figure 5. Working of ProAll-D. 

 
Figure 6. Data-set Section 

                                                
The supplementary section describes the detailed functioning of the web application. 

Conclusions 

This study builds on and expands earlier studies to develop the analysis of potential protein allergens. Our 

first aim has been to update the technique while keeping the previous factors in mind. We evaluated deep 

learning, ensemble learning, and machine learning models such as the Gaussian Naive Bayes, Radius 

Neighbour’s Classifier, Bagging Classifier, ADA Boost, Linear Discriminant Analysis, Quadratic Discriminant 

Analysis, and LSTM to predict the allergenicity of proteins. Extensive testing produced excellent results. They 


ADMET & DMPK 10(3) (2022) 231-240 ProAll-D: protein allergen detection 

doi: https://doi.org/10.5599/admet.1335 239 

were superior and corroborated earlier research. Furthermore, the AUC value of LSTM (the best 

performance) was 0.9152. So far, this is the only study to apply the aforementioned methods to evaluate 

protein allergenicity, and it will serve as a paradigm for future protein allergen prediction.  

Conflict of interest: All the authors declare no conflict of interest. 

References  
 

[1] M.B. Stadler, B.M.  Stadler . Allergenicity prediction by protein sequence. FASEB J. 17 (2003) 1141.  
https://doi:10.1096/fj.02-1052fje.  

[2] R.E. Poms, E.  Anklam, M. Akuhn . Polymerase chain reaction techniques for food allergen detection.  
Journal of AOAC International  87 (2004) 1391-1397.  https://doi.org/10.1093/jaoac/87.6.1391. 

[3] S. Saha, G.S. Raghava. AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.  Nucleic 
acids research 34 (2006) W202-W209. https://doi.org/10.1093/nar/gkl343.  

[4] H.C. Muh, J.C. Tong.  AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic 
cross-reactivity in proteins. PloS one 4 (2009) e5861. https://doi.org/10.1371/journal.pone.0005861. 

[5] H. Mohabatkar, B.M. Mohammad, K. Abdolahi, S. Mohsenzadeh. Prediction of allergenic proteins by 
means of the concept of Chou's pseudo amino acid composition and a machine learning approach. 
Medicinal Chemistry 9 (2013) 133-137. https://doi.org/10.2174/157340613804488341. 

[6] S. Vijayakumar, P.T.V. Lakshmi. IEEE International Conference on Bioinformatics and Biomedicine, A 
fuzzy inference system for predicting allergenicity and allergic cross-reactivity in proteins, Tongji 
University, China. (2013) 49-52. https://doi.org/10.1109/BIBM.2013.6732458. 

[7] I. Dimitrov, D.R. Flower, I. Doytchinova. BMC Bioinformatics, AllerTop – a server for in silico prediction 
of allergens, Cambridge, UK, 2013, 1-9.  https://doi.org/10.1186/1471-2105-14-S6-S4. 

[8] I. Dimitrov, L. Naneva, I. Doytchinova, I. Bangov. AllergenFP: allergenicity prediction by descriptor 
fingerprints. Bioinformatics 30 (2014) 846-851. https://doi.org/10.1093/bioinformatics/btt619. 

[9] I. Dimitrov, L. Naneva, I. Bangov, I. Doytchinova. Allergenicity prediction by artificial neural networks. 
Journal of Chemometrics 28 (2014) 282-286. https://doi.org/10.1002/cem.2597. 

[10] I. Dimitrov, I. Bangov, D.R. Flower, I. Doytchinova. AllerTOP v.2 – a server for in silico prediction of 
allergens. Journal of molecular modelling 20 (2014) 1-6. https://doi.org/10.1007/s00894-014-2278-5. 

[11] Ha. X. Dang, C.B. Lawrence. Allerdictor: fast allergen prediction using text classification techniques. 
Bioinformatics 30 (2014) 1120-1128. https://doi.org/10.1093/bioinformatics/btu004. 

[12] S.S. Negi, W. Braun. Cross-React: a new structural bioinformatics method for predicting allergen cross-
reactivity. Bioinformatics 33 (2017) 1014-1020. https://doi.org/10.1093/bioinformatics/btw767. 

[13] G.S. Ladics. Assessment of the potential allergenicity of genetically-engineered food crops. Journal of 
Immunotoxicology 16 (2019) 43-53. https://doi.org/10.1080/1547691X.2018.1533904. 

[14] S. Maurer-Stroh, N.L. Krutz, P.S. Kern, V. Gunalan, M.N. Nguyen, V. Limviphuvadh, F. Eisenhaber, G.F. 
Gerberick. AllerCatPro-prediction of protein allergenicity potential from the protein sequence. 
Bioinformatics 35 (2019) 3020-3027. https://doi.org/10.1093/bioinformatics/btz029. 

[15] M.S. Pallavi, H.S. Pramod Kumar. In-silico Analysis to Determine the Efficient Drug for Malignant 
Melanoma using Molecular Dynamics. Biomedical and Pharmacology Journal 13 (2020) 1463-1470. 
https://dx.doi.org/10.13005/bpj/2018. 

[16] I. Dimitrov, M. Atanasova. AllerScreener – a server for allergenicity and cross-reactivity prediction. 
Cybernetics and information technologies 20 (2020) 175- 184. https://doi.org/10.2478/cait-2020-0071. 

[17] N. Sharma, S. Patiyal, A. Dhall, A. Pande, C. Arora, G.P.S Raghava. AlgPred 2.0: an improved method for 
predicting allergenic proteins and mapping of IgE epitopes. Briefings in Bioinformatics 22 (2021) 
bbaa294. https://doi.org/10.1093/bib/bbaa294. 

https://doi.org/10.5599/admet.1335
https://doi:10.1096/fj.02-1052fje
https://doi.org/10.1093/jaoac/87.6.1391
https://doi.org/10.1093/nar/gkl343
https://doi.org/10.1371/journal.pone.0005861
https://doi.org/10.2174/157340613804488341
https://doi.org/10.1109/BIBM.2013.6732458
https://doi.org/10.1186/1471-2105-14-S6-S4
https://doi.org/10.1093/bioinformatics/btt619
https://doi.org/10.1002/cem.2597
https://doi.org/10.1007/s00894-014-2278-5
https://doi.org/10.1093/bioinformatics/btu004
https://doi.org/10.1093/bioinformatics/btw767
https://doi.org/10.1080/1547691X.2018.1533904
https://doi.org/10.1093/bioinformatics/btz029
https://dx.doi.org/10.13005/bpj/2018
https://doi.org/10.2478/cait-2020-0071
https://doi.org/10.1093/bib/bbaa294


Pallavi and Rakshitha   ADMET & DMPK 10(3) (2022) 231-240 

240  

[18] L. Wang, D. Niu, X. Zhao, X. Wang, M. Hao, H. Che. A Comparative Analysis of Novel Deep Learning and 
Ensemble Learning Models to Predict the Allergenicity of Food Proteins. Foods 10 (2021) 809. 
https://doi.org/10.3390/foods10040809. 

[19] M.S. Venkatarajan, W. Braun. New quantitative descriptors of amino acids based on multidimensional 
scaling of a large number of physical–chemical properties. Molecular modeling annual 7 (2001) 445-
453. https://doi.org/10.1007/s00894-001-0058-5. 

[20] I.A. Doytchinova, D.R. Flower. VaxiJen: a server for prediction of protective antigens, tumour antigens 
and subunit vaccines. BMC Bioinformatics 8 (2007) 1-7. https://dx.doi.org/10.1186%2F1471-2105-8-4.  

 
©2022 by the authors; licensee IAPC, Zagreb, Croatia. This article is an open-access article distributed under the terms and 

conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/)  

https://doi.org/10.3390/foods10040809
https://doi.org/10.1007/s00894-001-0058-5
https://dx.doi.org/10.1186%2F1471-2105-8-4
http://creativecommons.org/licenses/by/3.0/