key: cord-278182-75u57fw1 authors: Goh, Gerard Kian-Meng; Dunker, A. Keith; Foster, James A.; Uversky, Vladimir N. title: Shell disorder analysis predicts greater resilience of the SARS-CoV-2 (COVID-19) outside the body and in body fluids date: 2020-03-31 journal: Microb Pathog DOI: 10.1016/j.micpath.2020.104177 sha: doc_id: 278182 cord_uid: 75u57fw1 The coronavirus (CoV) family consists of viruses that infects a variety of animals including humans with various levels of respiratory and fecal-oral transmission levels depending on the behavior of the viruses' natural hosts and optimal viral fitness. A model to classify and predict the levels of respective respiratory and fecal-oral transmission potentials of the various viruses was built before the outbreak of MERS-CoV using AI and empirically-based molecular tools to predict the disorder level of proteins. Using the percentages of intrinsic disorder (PID) of the nucleocapsid (N) and membrane (M) proteins of CoV, the model easily clustered the viruses into three groups with the SARS-CoV (M PID = 8%, N PID = 50%) falling into Category B, in which viruses have intermediate levels of both respiratory and fecal-oral transmission potentials. Later, MERS-CoV (M PID = 9%, N PID = 44%) was found to be in Category C, which consists of viruses with lower respiratory transmission potential but with higher fecal-oral transmission capabilities. Based on the peculiarities of disorder distribution, the SARS-CoV-2 (M PID = 6%, N PID = 48%) has to be placed in Category B. Our data show however, that the SARS-CoV-2 is very strange with one of the hardest protective outer shell, (M PID = 6%) among coronaviruses. This means that it might be expected to be highly resilient in saliva or other body fluids and outside the body. An infected body is likelier to shed greater numbers of viral particles since the latter is more resistant to antimicrobial enzymes in body fluids. These particles are also likelier to remain active longer. These factors could account for the greater contagiousness of the SARS-CoV-2 and have implications for efforts to prevent its spread. In December of 2019, a mysterious virus that causes pneumonia and produces pneumonia-like symptoms; i.e., dry cough and fever was noticed to be spreading in Wuhan, Hubei Province, China [1] [2] [3] [4] . The culprit was quickly identified as a new respiratory coronavirus (CoV). This virus and its disease are currently known as Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2 and coronavirus infectious disease 2019 (COVID-19) respectively [1] [2] [3] [4] . SARS-CoV-2 is observed to be even more contagious (but fortunately less fatal) than SARS-CoV that was seen in 2003. In 2011-2012, just before the outbreak of the Middle Eastern Respiratory Syndrome (MERS-CoV), we built an empirically-based model that measures the percentage of intrinsic disorder (PID) of the membrane (M) and nucleocapsid (N) proteins in viruses [5, 6] . The main tool uses AI technology to recognize intrinsic disorder, given the protein sequence. The model involves the listing of shell disorder by coronaviruses that is easily clustered into three groups, which incidentally correlate with the known levels of fecal-oral and respiratory transmission of the various coronaviruses. The model predicts that the 2003 SARS-CoV (M PID = 8% and N PID = 50%) is included into a category of CoVs with intermediate levels of both respiratory and fecal-oral transmission potentials. An opportunity to test the validity of this model came when MERS-CoV first struck the Middle East in 2012. Upon the availability of the sequences of both MERS-CoV M and N proteins, the authors were able to confirm that MERS-CoV falls solidly into a category of viruses that have lesser respiratory transmission potential but have greater fecaloral transmission capabilitie [6, 7] . The categorization of MERS-CoV within this category was supported by the observations that MERS-CoV https://doi.org/10.1016/j.micpath.2020.104177 Received 17 February 2020; Received in revised form 18 March 2020; Accepted 27 March 2020 is not efficiently spreading among humans, unlike SARS-CoV, and requires close contact [8] . Furthermore, it is now known that camels are the natural reservoir for MERS-CoV, and viruses that are associated with farm animals, such as camels, tend to have greater fecal-oral transmission potentials [9] . In this study, we will have the opportunity to once again test the model. This time, it will involve the SARS-CoV-2. We shall see that not only the model is reliable and consistent, but is able to detect something very odd about this virus that could account for its quick spread. Protein intrinsic disorder is found when portions of or an entire protein has no structure. It is also called by other names such as unstructured, natively unfolded [10] [11] [12] . The main tool used to develop the CoVs shell disorder model involves the use of neural networks that had been trained to recognize ordered and disordered regions of a protein given its sequence. A suite of such artificial intelligence tools (AI) have been named PONDR® (http://www.pondr.com). Of a particular interest. PONDR® VLXT has been found to be accurate for structural proteins with protein-protein and protein-RNA/DNA interactions [13] [14] [15] [16] [17] . In fact, PONDR® VLXT has shown consistency and reproducibility when used for the study of structural viral shells that protect the virion from environmental damage. A wide range of viruses have been studied using PONDR® VLXT including equine infectious anemia virus (EIAV), human immunodeficiency virus (HIV), herpes simplex virus (HSV), smallpox virus, polio virus, Nipah and Ebola viruses, SARS-CoV, MERS-CoV, influenza virus, hepatitis C virus (HCV), hepatitis A virus (HAV), yellow fever virus (YFV), and Zika virus [2, [5] [6] [7] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] . The coronavirus virion contains four major structural proteins: the spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and the envelope (E) protein [5] [6] [7] [28] [29] [30] . These proteins are needed for production of a structurally complete viral particle [30] [31] [32] [33] . The main focus of our intrinsic disorder-based model is at the viral M and N proteins. The M protein of CoV was chosen over the E protein because the M protein is a major transmembrane protein that is found in large numbers in the virion, whereas E protein is a minor protein of the envelope. Also, in previous studies, membrane or matrix proteins of various viruses were analysed with respect to their intrinsic disorder content and roles in protecting the respective virions. The N protein is also important for our model, as it has been shown that greater disorder in the inner shell is associated with the mode of infection and virulence in other viruses [23] [24] [25] [26] . The protein sequences of the selected CoV M and N proteins were downloaded from UniProt (http://www.uniprot.org) The sequences were fed into a MYSQL database using JAVA [19] . The sequences were fed into the PONDR® VLXT, and the corresponding results were archived in the database. PONDR® VLXT provides intrinsic disorder predisposition scores for each residue. Residues with the scores of 0.5 and above are considered disordered, and, conversely, residues with the scores below 0.5 are taken as ordered. We represented the outputs of these analyses in the form of the percentage of intrinsic disorder (PID), which is defined as the number of residues predicted to be disordered divided by the total number of residues in a query protein. Statistical analyses using regression analysis and Analysis of Variance (ANOVA) were done using R statistical package [34] . Regression analyses were done using category as the dependent variable with M and N PIDs as independent variables or N PID as a sole independent variable. Categories were assigned the values of 1, 2 and 3 to the three groups as according to the level of respiratory transmission potential. The amino acid sequences of the SARS-CoV-2 M and N proteins are available at NCBI (https://www.ncbi.nlm.nih.gov/nuccore/MN908947). As aforementioned, the model predicting the mode of viral transmission was built before the 2012 MERS-CoV outbreak. This model was based on the analysis of mainly the PIDs found for N proteins, the viruses were clustered into three groups according to their transmission mode, while the M PIDs provide an indicator of resilience of the virus in the harsh environment. The three groups are represented in Table 1 . A Two-Way ANOVA (p < 0.001) indicates that the groups are statistically identifiable. The Table shows that category A consists of viruses that have greater respiratory transmission capabilities but lower fecaloral transmission potentials. Category B involves coronaviruses that have intermediate levels of both respiratory and fecal-oral transmission potentials, whereas category C comprises of viruses with lower respiratory transmission. Furthermore, a higher inner shell disorder is often associated with the greater infectivity, especially with regard to respiratory transmission potential since disorder allows greater promiscuity of binding [6, 7] . Table 1 shows that SARS-CoV is grouped into category B along with other viruses that have intermediate respiratory and fecal-oral transmission potentials. MERS-CoV falls into category C (lower respiratory potential but higher fecal-oral transmission capabilities). These are consistent with what we know about transmission behaviour of SARS-CoV and MERS-CoV [5] [6] [7] [8] 35] . It should be noted that category A are viruses with N PID above 53%, whereas category C comprises of those with N PID below 47%. Since SARS-CoV-2 has an N PID of 48%, which is above 46% but below 54%, it must be considered as being in category B. 3.3. SARS-CoV-2 has one of the hardest outer shell among CoVs (M PID = 6%) As already mentioned, based on the similar analysis of the SARS-CoV-2 (M PID = 6%, N PID = 48%), it is clear that this virus has to be placed in Category B, along with SARS-CoV (Table 1) . However, our data also pointed to something odd about the SARS-CoV-2. In fact, Table 1 and Fig. 1 show that data for the SARS-CoV-2, the M PID is 6%, which is second lowest in our sample of a fairly diverse selection of CoVs. This means that the SARS CoV-2 has one of the hardest outer shells among its counterparts. The model is consistent with the theme that viruses that remain in harsh environments require harder; i.e., less disordered, shells to survive. Furthermore, the outer shell is likely to play an even greater role in protecting the virion [20, 22, 27] . This has important implications pertaining to the characteristics and behaviors of the virus as we shall see below. A superficial glance at the model that places SARS-CoV and SARS-CoV-2 in the same category, with intermediate levels of respiratory transmission potentials may mislead the reader to believe that the model is not reliable, given the fact that we have now seen that SARS-CoV-2 is much more contagious than SARS-CoV. We believe that this is not the case. If SARS-CoV-2 would be placed in category C, the same category as MERS-CoV, then the model would be definitely wrong in terms of reproducibility. On the other hand, the placing of SARS-COV-2 in category A would be an inconsistency that would be difficult to explain. One would further notice that very few CoVs fall into category A, with the exception for HCoV-229E and infectious bronchitis virus (IBV), which make the chance of any CoVs including SARS-COV-2, to be in category A slim. More importantly, because SARS-CoV-2 is zoonotic and the only non-human CoV in category A is the avian IBV, placing SARS-COV-2 in category A could suggest that this virus is likely to be of avian origin, which is inconsistent with what we know about SARS-CoV-2. Phylogenetic study has confirmed that the closest relatives of SARS-CoV-2 are bat CoVs [4, 36] . We also noticed that at least three of closest neighbors with similar N PIDs of 47-48% are those of bat origin (See Table 1 ). This suggests that bat CoVs are likely to have an optimal mix of fecal-oral and respiratory transmission potentials that reflect their N PID range of 47-48%, which provides for greater fitness for the spread of the viruses among bats and that SARS-CoV-2 is likely to be of bat origin too given its N PID is 48%. There are highly debatable suggestions that snakes or pangolins [3, 37, 38] may be an intermediary host of the virus. Even if any of these is later deemed as true, our data imply that the virus had not the chance to evolved much with any intermediary host except bats, unlike the cases of SARS-CoV and MERS-CoV, in which the model does suggest that they had evolved with hosts such as civet cats and camels respectively. This suggestion is, however, based on N PID alone. The CoV shell disorder model was originally conceived as part of a spin-off from a parent research project that studied relationships among the mode of transmission, shell disorder and the absence of vaccines for certain sexually transmitted viruses such as HIV, HSV and HCV [19, 20, 27] . At that time even after 2003 SARS-CoV, the wealth of CoV knowledge lies in the field of veterinary medicine [5, 6] . Using what was then known about behaviors of porcine CoVs, a successful attempt to correlate M and N PIDs to modes of transmission was made [5, 6] . The results were similar to those found in Table 1 and Fig. 1 without the data pertaining to MERS-CoV and SARS-CoV-2, of course, since the MERS-CoV and COVID-19 outbreaks had yet to occur then. Evidence of a strong correlation of M and N PIDs to the mode of transmission (r 2 = 0.83, p < 0.001) can be seen in the regression analysis. Furthermore, a strong but oddly weaker correlation (r 2 = 0.77, p < 0.001) can also be found when N PID is the sole independent variable even if poor correlation (r 2 = 0.21) is found when M PID is the sole independent variable. This basically means that the model should include both M and N PIDs as the statistical model seems to suggest that M PID does contribute, even if slightly, to the mode of transmission in conjunction to the major contribution by N PID. .This is consistent with the idea that both M and N play roles in protecting the virion from environmental damage and the need to protect the virion is dependent on mode of transmission. A qualitative inspection of the viruses in the three groups also reveals consistency. For example, as in Table 1 , canine CoV-respiratory falls into category B, while canine CoVenteritis is in category C. The porcine CoVs are also consistent with observation of their behaviors as we will later see. We need to keep in mind, however, that the outer shell is likely to play a greater and more immediate role in protecting the virion from environmental damage as it encases the entire virion, whereas the N protein is likely to protect only the RNA. M PID is therefore a more crucial indicator of the "hardness" of the virion. However, because of the higher correlation of N to the mode of transmission, our model is apparently suggesting that some other mechanisms are involved. In fact, studies of inner shell of other viruses such as Nipah virus, Ebola virus and flaviviruses [23] [24] [25] [26] have indicated strong correlations between inner shell disorder and virulence. In the case of Nipah virus, the virulence and inner shell disorder were linked to the modes of transmission [26] . The mechanism in which the virus acquires greater virulence via inner shell disorder arises from the ability of the viral protein to bind promiscuously to the host protein [6, [23] [24] [25] [26] . This ability provides for rapid replication of the viral proteins and particles. This quick replication is also an efficient way to evade the host immune system by reproducing a large number of viral particles before the immune system even recognizes the virus, which then often goes on to overwhelm the body leading to the death of the host [6, [23] [24] [25] [26] . One highly plausible explanation for the link between rapid replication and the modes of transmission is that viral load in the body fluids of the infected host needs to be sufficiently high before the virus can be infectious via the respiratory mode. This therefore explains the fact that rapid viral replication and inner shell disorder were found to be the common link between virulence and respiratory transmission in the case of Nipah virus [26] . We don't, however, know if there is correlation between CoV virulence and inner shell disorder, even though the authors believe it exists, just like the other viruses. If so, N will make an excellent target for CoV vaccine development.The difficulty in the search for such links is that CoVs infect a large variety of animals and each CoV often uses a different receptor [4, 6, 39] . For example, MERS-CoV has a case-fatality rate of above 30% [8] for humans but is generally harmless to camels [27] . Furthermore, CoVs were generally not known to be virulent before the 2003 SARS-CoV outbreak and therefore lack adequate data for us to work with. The reason that we can make the necessary above-mentioned extrapolations pertaining to inner shell disorder among unrelated viruses with somewhat greater confidence is that inner shell proteins of different species of viruses often share similar functions and structures [30, 40, 41] . For example, the CoV N proteins, like the nucleocapsids in other viruses, are responsible for assembling viral particles by forming complexes with other viral proteins and RNA [24, 25, 41] . The N protein also helps with the packaging and budding of viral particles in the host ER (Endoplasmic Reticulum) [23, 30, 40] . Such tasks require binding to host proteins, and the promiscuous protein binding capabilities of a more disordered N will certainly help towards rapid replication of the virus. Since both SARS-CoV and SARS-CoV-2 have intermediate respiratory transmission potentials, how do we account for the greater contagiousness of the latter? We need to understand that the respiratory transmission potential of a single viral particle is just one of several factors that might contribute to the contagiousness of the virus. Other factors include how many particles are released by the host and how long does the particle remain in the environment. We mentioned that our model has detected something odd about SARS-COV-2: this virus has one of the hardest outer shell within the family as seen in Fig. 1 and Table 1 (M PID = 6%). Since the outer shell plays the greater role in protecting the virion in comparison to the inner shell, the harder outer shell provides virus greater resilience to outside the body environment and to the presence of digestive enzymes found in the saliva, mucus, and other bodily fluids [20, 22, [42] [43] [44] [45] [46] . As a result, the virus with the harder outer shell is able to remain active for a longer time and, therefore a lesser number of viral particles is required for a chance to infect someone. Furthermore, because the virus is more resistant to the digestive enzymes in the bodily fluids, an infected body is likely to discharged more infectious particles. Evidence of the protective role of the outer shells be seen in a wide variety of viruses. Viruses associated with saliva (e.g., YFV, ZIKV, EIAV, rabies) or fecal-oral transmission (e.g., poliovirus), have hard outer shells with low PID, whereas sexually transmitted viruses (e.g., HIV, HSV-2, HCV) have higher outer shell PIDs [6, [20] [21] [22] [23] [24] [25] [26] [27] . Also, viruses that are notorious for lasting in the environment for a long time such, as the smallpox virus, have low outer shell PIDs. Before the SARS-CoV outbreak, coronaviruses were not considered medically important, as, in the past, they had been primarily associated with cold viruses that cause minor sniffles. This is, however, not true in the field of veterinary medicine, where coronaviruses have posed a menace to livestock. This is especially so in the case of TGEV (transmissible gastroenteritis virus) and PEDV (porcine epidemic diarrhea virus). TGEV could move rapidly among farm pigs to quickly devastate the farming community if not controlled early [5] [6] [7] . Our analysis reveals that TGEV is of category C (higher fecal-oral, lower respiratory transmission potentials) with M and N PIDs at 14% and 43% respectively. The reason that it is able to move rapidly is that its high fecaloral transmission potential represents a more efficient mode of transmission among the farm animals. There is an antigenically related cousin, PEDV, which has similar characteristics as TGEV [29] . Like TGEV, diarrhea and vomiting are main symptoms of PEDV infection. While TGEV is highly infectious, PEDV is less contagious among farm pigs. A peculiar characteristic has been observed by farmers and veterinarians: PEDV can reappear out of nowhere from within the same pen that was previously occupied by infected pigs, even after the disease has passed on several months ago. An inspection of the M and N PIDs reveals the reason showing that PEDV belongs to the category B, along with SARS-CoV and SARS-CoV-2, unlike TGEV and MERS-CoV. This means that PEDV has a higher respiratory transmission potential than TGEV and that implies greater chances of fecal-respiratory transmission for the former. Obviously, viral particles in fecal materials that was somehow inadvertently left behind go on to infect a new population of pigs several months later [5] [6] [7] . Infectious droplets from mucus, vomit, or feces have actually been seen during the SARS-CoV outbreak. A stark example happened in a 2003 outbreak at Amoy Gardens, a housing complex in Hong Kong, where a huge cluster of infected patients was found and the inefficient sewerage and toilet ventilation systems were responsible for the spread [5, 35, 47] . We need to be reminded that PEDV, SARS-CoV, and SARS-CoV-2 are all under the same category B, and that SARS-CoV-2 has the hardest outer shell within the entire CoV family.SARS-CoV-2 is thus likely to be able remain infectious in the environment for the longest time regardless of being in feces, mucus, vomit, sweat, or saliva. A puzzling thing can be observed is when we notice that PEDV (M PID = 8%) has a harder outer shell than TGEV (M PID = 14%), given that TGEV is predicted as having higher fecal-oral transmission potential. We would think that having greater fecal-oral potential requires harder outer shell among CoVs. Apparently, the result is implying that this is not necessarily true, as we have seen, and having higher fecaloral transmission potential can actually provide greater efficiency in the spread among farm animals, as we have seen in TGEV, such that the virus does not have to stay long outside the body. The ability to stay longer in the environment could, however, provide advantages to a virus with greater respiratory transmission potential, especially with respect to spread via airborne virus from feces or body fluids. This is likely the case for SARS-CoV-2. We have presented here a set of data that not only place the SARS-CoV-2 within the category of CoVs that have intermediate levels of both respiratory and fecal-oral transmission potentials, alongside SARS-CoV and PEDV, but have also predicted SARS-CoV-2 to have one of the hardest outer shells among most CoVs. This peculiarity is likely responsible for its high level of contagiousness, since hardness of its outer shell could provide the virus with the greater resilience to the conditions outside the body and in body fluid, as the harder shell will better protect the virion from damage as a result of the hostile environment and action of the digestive enzymes found in bodily fluids. As a result, it is likely that the infected body can shed more infectious particles that have a greater chance of infecting a person over its lifetime. Chances of infections via indirect contact and airborne virus from feces and bodily fluids are therefore greater. The results of our research could have important implications for epidemiologists and public health officials. GKMG. conceived the idea, collected and analyzed the data and wrote the first draft. VNU. helped with the collection and analysis of literature data, reviewed and revised the draft. AKD and JAF reviewed the manuscript and provided the resources necessary for the research. GKMG. is an independent researcher and the owner of Goh's BioComputing, Singapore. GKMG. has also written a book ("Viral Shapeshifters: Strange Behaviors of HIV and Other Viruses") on a related subject. The authors have no other potential conflict of interests. WHO, Novel coronavirus Rigidity of outer shell predicted by protein disorder model sheds light on COVID-19(Wuhan-2019-nCoV) infectivity Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2 A pneumonia outbreak associated with a new coronavirus of probable bat origin Understanding viral transmission behavior via protein intrinsic disorder prediction: Coronaviruses Viral Shapeshifters: Strange Behavoirs of Hiv and Other Viruses, Simplicity Research Institute Prediction of intrinsic disorder in MERS-CoV/ HCoV-EMC supports a high oral-fecal transmission WHO, Middle Eastern respiratory syndrome coronavirus (MERS-CoV) Identification of MERS-CoV in dromedary camels Intrinsically unstructured proteins Why are "natively unfolded" proteins unstructured under the physiological conditions? Intrinsically unstructured proteins: Re-assessing the protein structure-paradigm Predicting protein disorder for N-, C-, and internal regions Predicting binding regions within disordered proteins Sequence complexity of disordered protein Mining alpha-helix-forming molecular recognition features with cross species sequence alignments Coupled folding and binding with alpha-helix-forming molecular recognition elements Viral disorder or disordered viruses: do viral proteins possess unique features? Protein intrinsic disorder toolbox for comparative analysis of viral proteins A comparative analysis of viral matrix proteins using disorder predictors Protein intrinsic disorder and influenza virulence: the 1918 H1N1 and H5N1 viruses Shell disorder, immune evasion and transmission behaviors among human and animal retroviruses Detection of links between Ebola nucleocapsid and virulence using disorder analysis Correlating flavivirus virulence and levels of intrinsic disorder in shell proteins: protective roles vs. Immune evasion Zika and flavivirus disorder: virulence and fetal morbidity Nipah shell disorder, mode of transmission and virulence HIV vaccine mystery and viral shell disorder Fundamentals of Molecular Virology Antigenic relationships among procine epidemic diarrhea virus and transmissible gastroenteritis virus strains Fielding BC the coronavirus nucleocapsid is a multifunctional protein The molecular biology of coronaviruses Efficient assembly and release of SARS coronavirus-like particles by a heterologous expression system MERS-CoV virus-like particles produced in insect cells induce specific humoural and cellular immunity in rhesus macaques R: A Language and Environment for Statistical Computing Twenty-first Century Plague: the Story of SARS Gemomic characterisation and epidemiology of 2019 novel-coronavirus Repurposing of clinically approved drugs for treatment of coronavirus disease 2019 in a 2019-novel coronavirus (2019-nCoV) related coronavirus model Cross-species transmission of the newly identified coronavirus 2019-nCoV Bactrian camels shed large quantities of Middle East respiratory syndrome coronavirus (MERS-CoV) after experimental infection Characterization of filovirus protein-protein interactions in mammalian cells using bimolecular complementation Specific interaction of capsid protein and importin alpha/beta influences West Nile virus production Antiviral activities in saliva Lysozyme in body fluids From bacterial killing to immune modulation: recent insights into functions of lysozyme Antimicrobial peptides and skin defense immune system Innate antimicrobial activity of nasa secretions Evidence of airborne transmission of the Severe Acute respiratory syndrome virus