key: cord-309898-sju15hev authors: Hu, Yiwen; Buehler, Markus J. title: Comparative analysis of nanomechanical features of coronavirus spike proteins and correlation with lethality and infection rate date: 2020-11-02 journal: Matter DOI: 10.1016/j.matt.2020.10.032 sha: doc_id: 309898 cord_uid: sju15hev The novel coronavirus disease, COVID-19, has spread rapidly around the world. Its causative virus, SARS-CoV-2, enters human cells through the physical interaction between the receptor-binding domain (RBD) of its spike protein and the human cell receptor ACE2. Here, we provide a novel way in understanding coronavirus spike proteins, connecting their nanomechanical features – specifically its vibrational spectrum and quantitative measures of mobility – with virus lethality and infection rate. The key result of our work is that both, the overall flexibility of upward RBD and the mobility ratio of RBDs in different conformations, represent two significant factors that show a positive scaling with virus lethality and an inverse correlation with the infection rate. Our analysis shows that epidemiological virus properties can be linked directly to pure nanomechanical, vibrational aspects, offering an alternative way of screening new viruses and mutations, and potentially exploring novel ways to prevent infections from occurring. The novel coronavirus disease, COVID-19, has spread rapidly around the world [1] [2] [3] [4] [5] . Its causative virus, SARS-CoV-2, enters human cells through the interaction between the receptor-binding domain (RBD) of its spike protein and the cell receptor ACE2 [6] [7] [8] . Due to the significant role that coronavirus spike protein plays in receptor recognition, viral fusion and cell entry, it is a promising target for drug and vaccine development. Here, we provide a novel way towards better understanding the coronavirus spike proteins, connecting its nanomechanical features -especially their vibrational patterns -with virus lethality and infection rate 6 . In a broader context, the mechanics of proteins has long been a subject of interest, and this study shows how it can be a useful tool to help us understand complex disease etiology by connecting nanoscopic physical features with epidemiological data [9] [10] [11] [12] [13] [14] [15] . To provide a comparative analysis -specifically focused on how nano-level features relate with macroscopic epidemiolocal observables -we focus on different coronavirus types within the same family of pathogens. Over the past decades, several types of coronaviruses have emerged. The virus types HCoV-NL63 and HCoV-HKU1 are often reported to cause lower respiratory tract infections, while HCoV-OC43 and HCoV-229E are usually associated with comparatively mild symptoms similar to the common cold 5, 16, 17 . The ones that threaten public health more seriously are three highly pathogenic human coronaviruses -namely: SARS-CoV, MERS-CoV and SARS-CoV-2. SARS-CoV was first reported in China in November 2002, then quickly spread globally, resulting in over 8,000 infections with about 800 deaths 18 . MERS-CoV was first identified in Saudi Arabia in June 2012, featuring limited transmission with case fatality rate as high as 35% 19, 20 . SARS-CoV-2 was first reported in China in December 2019 3, 21, 22 . It can easily transmit from human to human, resulting in more than 40 million global cases as of October 2020 23, 24 . The spike protein of the coronavirus plays an essential role in receptor recognition, viral fusion and cell entry 6, 25, 26 . The process represents a complex mechano-chemical process, whereby during the entry into the host cell, the spike protein first binds to a cell receptor through the receptor-binding domain (RBD) and then begins the fusion process. It is believed that the RBDs of different coronaviruses recognize different cell receptors 3, 27 . SARS-CoV, SARS-CoV-2 and HCoV-NL63 recognize angiotensinconverting enzyme 2 (ACE2) as their receptor in the human body, while MERS-CoV recognizes dipeptidyl peptidase 4 (DPP4) as its receptor 1, 7, 8, 28 . In order to successfully bind to the receptor, the spike protein of a specific coronavirus must maintain a receptor-accessible state with at least one RBD in upward conformation. This is because otherwise there would be steric clashes hindering the binding process 4 . In experimental work, this specific type of receptor-accessible state has been captured for MERS-CoV, SARS-CoV and SARS-CoV-2 6, 29 . While the structure of coronavirus spike protein is well studied, much less attention has been focused on the connection between the mechanical features of the virus with virus lethality as well as infection rate. The structural basis of SARS-CoV viral infectivity has been explored to some extent, pointing out that the Trp-rich region of S protein is essential 30 . It has also been observed that cleavage of the spike protein of SARS-CoV is associated with viral infectivity 31 . However, there has been no lateral comparative study between similar coronaviruses on this type of connection. It remains a question which kind of mechanical and structural properties could possibly relate to the mortality and infection rate of the virus. If successful, this approach may provide an alternative or complementary way to screen viruses or mutations against large-scale epidemiological data, provide additional mechanistic insights into disease etiology, and offer potential targets for therapies or preventive measures. In protein science, normal mode analysis (NMA) has long been one of the most comprehensive yet efficient methods to calculate vibrational normal modes and analyze protein flexibility, which provides the rationale for use in this study 32 . Another reason behind the broad use of NMA is that the lowfrequency modes elucidated by NMA could often describe the real-world motions of a protein, and often bear important functional significance 33 . In NMA, the atoms are modeled as point masses, which are connected by springs that represents the interatomic force fields. After constructing the Hessian matrix based on the second-order partial derivatives of the potential energy function, the normal modes and corresponding frequencies can be directly obtained by diagonalizing the matrix and computing its eigenvalues 34 (further details see Experimental Procedures). SARS-CoV-2 enters human cells through the interaction between the receptor-binding domain (RBD) of its spike protein and the cell receptor ACE2, as is depicted in the schematic shown in Figure 1 . Our study hence focuses on the spike protein, which is essential for the infection to take place. The spike protein of coronavirus is composed of an amino (N)-terminal S1 subunit and a carboxyl (C)-terminal S2 subunit. The S1 subunit, which consists of an N-terminal domain (NTD) and three C-terminal domains (CTD), is responsible for recognizing and binding to the host cell receptor. It has been reported that for the betacoronavirus that utilizes CTD1 of its S1 subunit as RBD, there is a prerequisite conformational state for receptor binding where at least one RBD should be upward 4, 6 . In this paper, we refer to this receptoraccessible state as "open state" and receptor-inaccessible state as "closed state". As outlined in Figure 1 , we conduct a normal mode analysis (NMA) 33 , our open-state analysis is limited to these three highly pathogenic beta-coronaviruses. For closed-state exploration, since there have been no reports about closed conformational state of the MERS-CoV spike protein, we choose the HCoV-NL63 spike protein because it shares the same relevant cell receptor ACE2 with SARS-CoV and SARS-CoV-2 spike proteins. Figure 2 depicts data that shows that the lowest-frequency normal modes of MERS-CoV, SARS-CoV and SARS-CoV-2 spike proteins are all associated with a swing motion of upward receptor-binding domain (RBD) to different extents. This type of global movements corresponds well with the molecular motions directly reported in experiments 4, 6 . Considering the required receptor-accessible state for receptor binding, the swing of RBD is of functional significance since it is the likely way by which a spike protein changes from closed state to open state to facilitate the binding of target receptors. From a lateral aspect, this observed shared type of lowest-frequency normal mode movements indicates the structural similarity of MERS-CoV, SARS-CoV and SARS-CoV-2 spike proteins. According to our analysis, while the sequence identity of SARS-CoV S and SARS-CoV-2 S is as high as 78.8%, MERS-CoV S has only about 30% sequence identity with SARS-CoV S and SARS-CoV-2 S. This indicates that a small portion of the whole sequence of beta-coronavirus would largely determine the general structure topology and thus the overall global motion (details see Supplementary Material Figure S1 ). Figure S1 depicts a sequence alignment of MERS-CoV S, SARS-CoV S and SARS-CoV-2 S, where identical residues are denoted by *. The analysis reported in 36 reveals that the percentage identity in NTD and RBD, which reflect the major parts participating in the lowest normal mode movement, is about 22%. Compared with 30% sequence identity of the whole spike protein, this partial percentage identity is lower and confirms the concept that likely only a few sequence pieces ultimately determine the shared global movement of coronavirus spike protein in open state. We further note that the sequence similarity in the S2 subunit could play an important role, as it may contributes to the comparatively higher rigidness of the S2 subunit. While MERS-CoV, SARS-CoV and SARS-CoV-2 spike proteins share the same type of lowestfrequency normal mode movements, their fluctuation profiles differ dramatically, as is illustrated by Figure 3 . Generally speaking, the S1 subunit of coronavirus spike protein is much more flexible than its S2 subunit. Among these beta-coronaviruses, the active RBD of MERS-CoV spike enjoys super mobility and only SARS-CoV-2 spike protein and its D614G mutant exhibit a comparatively flexible downward RBD. Interestingly, when we focus on fluctuations over the upward RBD, the figure reveals that for all these spike proteins the fluctuations first slowly build up to its maximum and then decrease sharply, which demonstrates that the appearance of large flexibility of upward RBD is based on some common detailed structures of these beta-coronaviruses. Notably, panel (D) of Figure 3 shows that the D614G mutation decreases the general flexibility of upward RBD and significantly enhances the mobility of a limited region in one downward RBD in the spike protein. It is also shown that the D614G mutation results in a general slight flexibility decrease in the NTDs of all three chains, and importantly, no noticeable differences in other areas. Using the fluctuation profile data, two significant mechanical factors are identified, which are: (1) the overall flexibility of upward RBD and (2) the mobility ratio of RBDs in different conformations. Figure 4 provides a correlation diagram for MERS-CoV, SARS-CoV and SARS-CoV-2 spike protein, where the overall flexibility of upward RBD is evaluated by the average fluctuation of open-state RBD and the mobility ratio is quantified as the ratio of maximum fluctuations over upward and downward RBDs. The data shows that both factors have positive correlation with case fatality rate and inverse relationship with the virus infectivity. We find that for the flexibility ratio, the smaller it is, the larger the mobility of downward RBD is compared to upward one, which could indicate a larger possibility toward a second standing RBD. This would make it even easier for the spike protein to bind to the host cell receptor and thereby increasing the virus infectivity. On the other hand, if the flexibility of downward RBD is not large enough to generate conformational change, as flexibility ratio decreases, it becomes more difficult for the J o u r n a l P r e -p r o o f receptor to bind with the right RBD since the downward one is quite active. This may provide an explanation for the positive correlation between flexibility ratio and virus lethality that we see in the epidemiological data. One possible reason for the positive relationship between overall flexibility of upward RBD and the mortality rate could be that the flexible upward RBD is more active when binding to the receptor, and may hence benefit the subsequent membrane fusion process. Even though there is limited available empirical data at this point, there is actually an intrinsic negative relationship between the mortality rate and infection rate of MERS-CoV, SARS-CoV and SARS-CoV-2, which could help explain why the infectivity is inversely correlated with the general flexibility of upward RBD. While there are many other factors situated between nanomechanical and epidemiological aspects, such as binding affinities 37 and dysregulation of type I interferon responses 38 , the influence of which on different epidemiological characteristics of coronavirus has not been fully explained, and our analysis points out the direct correlation between nanomechanical features and the lethality and infection rate of coronavirus. The goal is to attempt to improve our understanding of the direct relationship between the nanoscale and epidemiological level, not considering internal relationships. As the results show, this perspective provides useful insights into the mechanics of disease relationships ( Figure S2 ). Figure 5 show different flexibility variations in SARS-CoV-2, SARS-CoV and HCoV-NL63 spike protein, which share the same human receptor ACE2. Among them, there exists a sharp increase and large variation in flexibility in RBD of SARS-CoV-2 S, while SARS-CoV S appears to have more flexible structural regions. HCoV-NL63, the only one in this comparative study that is classified as alpha-coronavirus and cause only mild symptoms, has a different S1 subunit topology, bringing more flexibility to NTD rather than CTD1, where its RBD is situated. Thus, our suggestion about the importance of the general flexibility of upward RBD in open-state analysis needs to be expanded. As for closed states, the overall flexibility of RBD in a single protomer shows positive relationship with virus lethality, since the disease severity is regressive in order of SARS-CoV, SARS-CoV-2 and HCoV-NL63. Notably, for these three coronaviruses in closed states, the overall flexibility of RBD of a single protomer is positively associated with their disease severity. Panels (E) to (G) in Figure 5 provide detailed structures of the RBD of the above virus complexed with their shared receptor human ACE2. The CTD1 in S1 subunit of coronavirus often contains a core structure, which is composed of several antiparallel βsheet and short connecting α-helices, and one or more extended loops. Theses extended loops, referred to as receptor binding motif (RBM), is located at the edge of the core structure and is usually responsible for realizing the interactions with the receptor if the virus uses CTD1 as its RBD. Beta-coronaviruses such as SARS-CoV and SARS-CoV-2 have a unique long-extended loop as their RBM, accounting for the most flexible region of their RBD in both open and closed states. As for HCoV-NL63, there are three separated short RBMs, which are quite restricted and unable to generated large mobility. These insights may also explain its lower-affinity interaction with ACE2, at least to some extent. We reported an analysis linking key nanomechanical vibrational features of various coronavirus spike proteins with epidemiological data. As shown in Figure 2 , structural similarity and major movement associated with the swing of upward RBD is seen throughout this family of viruses, representing a sort of universal feature of this class of pathogens. The molecular modeling results show that the general motion corresponds with experimental observation and have functional importance. We find that the active RBD of MERS-CoV enjoys super mobility, whereas only SARS-CoV-2 and its D614G variant show a comparatively flexible downward RBD. The more recently occurring D614G mutation decreases the general flexibility of upward RBD and largely enhances the mobility of some small region in one downward RBD in the spike protein. As shown in Figure 4 , the key result of this study, the general fluctuation profiles of upward RBD and the associated fluctuation ratio have a positive correlation with case fatality rate and inverse relationship with the virus infectivity. Our results offer two different explanations for the effects of the flexible downward RBD: (1), the possibility towards a second standing RBD if the mobility is large enough, and (2) that it could indicate difficulty for the receptor to bind with the right RBD if no conformational change happens. We hypothesize that there may be a possible threshold between these two effects, which could be studied in future research. These insights offer several possible applications, including a search for inhibitors that could bind to downward RBD may provide a viable strategy. We find a sharp increase and large variation in flexibility in SARS-CoV-2 S, whereas more flexible structural regions are present in the closed-state SARS-CoV S. The long-extended loop is unique for beta-coronaviruses, and also accounts for the most flexible region in open states. HCoV-NL63 S has three separated short RBMs, which are unable to generate large mobility. This may also explain its lower-affinity interaction with ACE2. As for possible applications, we may target the RBM to identify new inhibitors that lock the closed S-protein conformation. This opens a question whether perhaps, we can we utilize the significant flexibility difference for potential inhibitor screening for the development of novel treatment methods or design future experiments for gain-offunction research 39, 40 . Future work may address additional influences of temperature dependence (to explore whether seasonal variations of temperature and other environmental factors can be linked to nanoscopic phenomena). Other aspects may include a more detailed analysis of intermediate steps in the mechanistic hierarchical progression as schematically outlined in Figure S2 , including aspects of the strong age-dependence of COVID-19 disease progression. Please contact Prof. Markus Buehler via mbuehler@mit.edu. No new materials were generated in this work. All data are available upon request to the Lead Contact author. To assess the molecular mechanics from an atom-by-atom perspective, we conduct normal mode analyses (NMA) of coronavirus spike proteins in receptor-accessible state with one upward RBD as well as receptor-inaccessible state where all three RBDs are in downward position. We access the desired threedimensional protein structures from the Protein Data Bank 41 and prepare the atomistic models with Visual Molecular Dynamics (VMD) 2, 42 . Before normal mode analysis, 10,000 steps of conjugate gradient energy minimization are performed using Nanoscale Molecular Dynamics (NAMD) 43 in order to relax the protein structure 44 . No further MD simulation is performed since the protein structure from Protein Data Bank is already experimentally equilibrated. A coarse-grained elastic network model (ENM) available in the Bio3D package in R is employed to analyze the normal modes of coronavirus spike protein structures [45] [46] [47] . This model uses N, CA, C atoms to represent the protein backbone and selects 0 to 2 significant side chains based on their size and distance to CA atoms, proved to have comparable accuracy with all-atom ENM. Here, the atomic displacements are scaled for temperature 300K. For the SARS-CoV-2 D614G mutant, we implement the mutation to the open-state spike protein (PDB ID: 6VSB) and carry out 10,000 steps of energy minimization and MD simulation for 50 ns with NAMD. We then compute the average residue mean square derivations (RMSD) based on the last 25 ns equilibrium period and pick out 10 frames with the nearest RMSD from the trajectory file so that we could conduct normal mode analysis on them. The fluctuation profile of D614G mutant is calculated as the average of the normal mode fluctuations of these 10 configurations. We notice that some local unfolding occurs at the end of each chain, which induces abnormally high fluctuations in the fluctuation profile. Since these local events are far from the RBD, we do not consider them in the analysis and set the fluctuations of the terminal 5-8 residues to zero. For consistency, the same approach is used for the fluctuation difference of SARS-CoV-2 spike protein and the D614G mutant. The global confirmed case numbers and the case fatality rate presented in Figure 4 are updated as of Sep 1, 2020. According to the analysis based on GISAID SARS-CoV-2 sequence database, by Aug 31, 2020, 80% of the global sequence database were identified with D614G mutation (71242 sequences counted in total) 48, 49 . Based on this estimate we use 80% of the COVID-19 global case number as the infection number of SARS-CoV-2 with G614 and the remaining 20% as the case number of the original SARS-CoV-2 virus type. Since there has been little evidence assessing an association between D614G mutation and disease severity, the same case fatality rate is applied to the original SARS-CoV-2 spike protein and its D614G variant. The two factors depicted in Figure The causative virus of the COVID-19, SARS-CoV-2, enters human cells through the physical interaction between the receptor-binding domain (RBD) of its spike protein and the human cell receptor ACE2. While the structure of coronavirus spike protein is well studied, it remains unclear how those mechanical features of the virus affect its epidemiological characteristics. Here, we report that both, the overall flexibility of upward RBD and the mobility ratio of RBDs in different conformations, represent two significant factors that show a positive scaling with virus lethality and an inverse correlation with the infection rate. Our analysis shows that epidemiological virus properties can potentially be linked directly to pure nanomechanical, vibrational aspects, offering an alternative way of screening new viruses and mutations, and perhaps even novel ways to prevent infections from occurring. • This work provides a novel way in understanding coronavirus pathology using mechanics • Reports major movement associated with the swing of upward RBD for open-state viruses • Links key nanomechanical vibrational features directly to the epidemiological data • Provides possibility of screening new viruses or mutations from a mechanical aspect We provide a novel way towards understanding coronavirus spike proteins, connecting their nanomechanical features -specifically its vibrational spectrum and quantitative measures of mobilitywith virus lethality and infection rate. Our study shows that the nanomechanics of proteins -captured in their continuous motions -can be a useful tool to help us understand complex disease etiology by connecting nanoscopic physical features with epidemiological data. Potential application includes developing mechanical ways of screening new viruses and mutations, and exploring novel treatment methods. Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor Sartorius products Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding Human coronavirus NL63, a new respiratory virus Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural biology: Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Coiled-coil intermediate filament stutter instability and molecular unfolding Deformation and failure of protein materials in physiologically extreme conditions and disease Triangular core as a universal strategy for stiff nanostructures in biology and biologically inspired materials Nanomechanical sonification of the 2019-nCoV coronavirus spike protein through a materiomusical approach Molecular model of human tropoelastin and implications of associated mutations Nanomechanics of functional and pathological amyloid materials Sustained release silk fibroin discs: Antibody and protein delivery for HIV prevention Coronavirus HKU1 and other coronavirus infections in Hong Kong Epidemiology and clinical presentations of the four human coronaviruses 229E, HKU1, NL63, and OC43 detected over 3 years using a novel multiplex real-time PCR method SARS and MERS: Recent insights into emerging coronaviruses Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia MERS in South Korea and China: A potential outbreak threat? A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-toperson transmission: a study of a family cluster Clinical features of patients infected with 2019 novel coronavirus in Wuhan COVID-19 Map -Johns Hopkins Coronavirus Resource Center Estimates of the severity of coronavirus disease 2019: a model-based analysis Pre-fusion structure of a human coronavirus spike protein Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2 Structural basis of receptor recognition by SARS-CoV-2 Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26 Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains Importance of SARS-CoV spike protein Trprich region in viral infectivity Cleavage of spike protein of SARS coronavirus by protease factor Xa is associated with viral infectivity Building-block approach for determining low-frequency normal modes of macromolecules Global dynamics of proteins: Bridging between structure and function Analysis of the vibrational and sound spectrum of over 100,000 protein structures and application in sonification DynaMut: Predicting the impact of mutations on protein conformation, flexibility and stability The EMBL-EBI search and sequence analysis tools APIs in 2019 Cell entry mechanisms of SARS-CoV-2 Dysregulation of type I interferon responses in COVID-19 Might SARS-CoV-2 Have Arisen via Serial Passage through an Animal Host or Cell Culture?: A potential explanation for much of the novel coronavirus' distinctive genome Ethical and Philosophical Considerations for Gain-of-Function Policy: The Importance of Alternate Experiments The Protein Data Bank The protein data bank NAMD: A parallel, object-oriented molecular dynamics program Scalable molecular dynamics with NAMD Harmonicity in slow protein dynamics A new approach for determining lowfrequency normal modes in macromolecules Building-block approach for determining low-frequency normal modes of macromolecules Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus COVID-19 and cardiovascular disease: from basic mechanisms to clinical perspectives We acknowledge support from the MIT-IBM AI lab, ONR (N000141612333 and N000141912375), AFOSR (FATE MURI FA9550-15-1-0514), NIH (U01 EB014976), as well as ARO (W911NF1920098). The authors declare no competing interests.