key: cord-267046-ewnjgps5 authors: strauss, james h; strauss, ellen g title: virus evolution: how does an enveloped virus make a regular structure? date: 2001-04-06 journal: cell doi: 10.1016/s0092-8674(01)00291-4 sha: doc_id: 267046 cord_uid: ewnjgps5 nan e2 is ‫024ف‬ amino acids in size, of which about 360 residues form the ectodomain, whereas m is only about ancestral source. they propose a fit of e1 into the cryo-75 residues long, of which about 38 residues are present em density of semliki forest virus determined to 9 å in the ectodomain. thus, one can imagine that an immaresolution (mancini et al., 2000) . the paper by pletnev ture flavivirion containing prm (about 170 residues) et al. (2001) shows that the bulk of e1 does not contribute rather than m might more resemble the alphavirus structo the outer portions of the spike but, instead, forms a ture, with short projecting spikes, but cleavage to m layer closely apposed to the lipid bilayer, analogous to removes the spikes. the position of e in the flavivirion. they also show that the evolution of enveloped viruses e2 projects upward to the full-length of the spike. thus, the parallels between the assembly of alphaviruses and e1 forms what has been called the skirt that surrounds flaviviruses and the similarities in structure revealed by the lipid bilayer and part of the lower domains of the the present studies suggest that an enveloped virus spikes, whereas e2 forms the projecting part of the with an icosahedral structure arose long ago and has spike. the absence of spikes in flaviviruses could then diverged into these two familes. many other enveloped be due to a difference between the cleaved e2 and m. viruses whose structures are more or less known use quite different assembly mechanisms ( virus evolution virus taxonomy: seventh report of the international committee on taxonomy of viruses key: cord-305173-95o5z685 authors: martin, thomas r.; wurfel, mark m. title: a triffic perspective on acute lung injury date: 2008-04-18 journal: cell doi: 10.1016/j.cell.2008.04.006 sha: doc_id: 305173 cord_uid: 95o5z685 acute lung injury (ali) is a leading cause of death in people infected with h5n1 avian influenza virus or the sars-coronavirus. imai et al. (2008) now report that ali is triggered by the signaling of oxidized phospholipids through toll-like receptor 4 (tlr4) and the adaptor protein trif. these findings provide insight into the molecular pathogenesis of ali, a condition for which treatment options are currently very limited. is thicker and is not vascularized, thus resulting in a diffusion distance between air and dermal capillaries that is too great to serve as an efficient means of o 2 uptake. finally, does dysregulation of cutaneous blood flow have any effect on body temperature homeostasis? remarkably, mice lacking vhl in their keratinocytes die from hypothermia when subjected to cold stress due to a failure of cutaneous vasoconstriction (r. johnson et al., personal communication) . considering the complex homeostatic mechanisms that are subserved by the cutaneous vascu-lature, the study by boutin et al. elegantly demonstrates that beauty is not the only characteristic that is skin deep! acute lung injury (ali) affects more than 200,000 people in the us each year, with approximately 75,000 deaths, making it an important cause of morbidity, mortality, and health care expenditure (rubenfeld et al., 2005) . bacterial and viral infections are important risk factors for ali, but aspiration of gastric contents, major trauma, and repeated transfusions are additional risks. ali is also a leading cause of death in people infected with h5n1 avian influenza virus or the coronavirus that causes sars (severe acute respiratory syndrome). in this issue, imai et al. (2008) report surprising insights from murine studies that provide a new perspective on the mechanisms contributing to ali in humans. the alveolar membrane of the lungs is the largest surface area of the body that is in continuous contact with the out-side environment, and a complex set of defenses have evolved to protect it against inhaled particulates and microbes. the alveolar wall is a delicate structure, consisting of a thin alveolar epithelial layer, a basement membrane composed of collagens, glycoproteins, and glycosaminoglycans, and a thin endothelial cell layer. surfactant phospholipids and associated proteins lining the alveolar surface are critical in reducing surface tension in alveolar fluid, so that alveoli do not collapse at low lung volumes. cells called type ii pneumocytes in the alveolar walls produce surfactant and actively transport sodium ions from the lumen to the interstitium, facilitating passive water movement from the alveoli to the interstitium and lymphatics in order to keep the airspaces dry. acute damage to epithelial or endothelial cells in the alveolar membrane causes the clinical syndrome of ali, in which the alveolar spaces fill with proteinaceous exudates, producing severe alterations in gas exchange, critical hypoxemia, and death in the absence of aggressive medical care. the hallmark findings of ali include acute neutrophilic inflammation and an array of proinflammatory cytokines in the lungs, suggesting that activation of innate immunity is an initial event, whether or not overt infection is present. activation of innate immune pathways combined with the physical stresses created by mechanical ventilation cause a synergistic increase in lung injury, but the mechanisms underlying ali are not clear (dos santos and slutsky, 2006) . in order to identify susceptibility factors for lung injury, imai and colleagues screened several strains of mice using a a triffic perspective on acute lung injury thomas r. martin 1, * and mark m. wurfel 1 simple model of lung injury, intratracheal instillation of 1.5 n hydrochloric acid (hcl), which approximates severe gastric acid aspiration. surprisingly, mice with an inactivating mutation in toll-like receptor 4 (tlr4) were protected from lung injury in this noninfectious model. tlr4 is the primary receptor for gramnegative bacterial lipopolysaccharide (lps) and also recognizes endogenous stimuli termed "alarmins" at sites of inflammation (oppenheim et al., 2007) . in macrophages, tlr4 signals via two different intracellular adaptor proteins, myd88 and trif (tir-domain-containing adaptor-inducing interferon-β), leading to two distinct intracellular signaling programs (beutler, 2004) . the myd88 pathway causes rapid nf-κb activation and cytokine production. the trif pathway leads to the production of type i interferons via interferon regulatory factor 3 (irf-3) and also causes delayed nf-κb activation via activation of tnf receptorassociated factor 6 (traf6) (hoebe et al., 2003; sato et al., 2003) . surprisingly, imai and colleagues found that trif-deficient mice and mice lacking traf6 in myeloid cells were protected from hclinduced injury, whereas myd88 knockout mice were not, suggesting that the trif pathway, acting through traf6, is the major effector pathway in this noninfectious model. they also showed that trif-dependent lung injury is likely to be mediated by production of interleukin 6 (il-6), as il-6-deficient mice were also protected from injury. the finding that the tlr4-trif pathway mediated injury in the absence of an infectious agent raised questions about the identity of the stimulus for tlr4, and the mechanism responsible for preferential activation of the trif pathway. the lung lavage fluid of hcl-treated mice contained oxidized phospholipids (oxpls) detected by immunocytochemistry. an anti-oxpl antibody significantly reduced the proinflammatory activity of lung lavage fluid on lung macrophages in vitro. intratracheal instillation of synthetically oxidized phospholipids caused lung inflammation in normal and surfactantdepleted mice, whereas mice lacking tlr4 were protected. the monoclonal antibody used to detect oxpl provided a clue to the specific oxpl responsible because it recognizes phospholipids containing oxidized phosphatidylcholine (e.g., 1-palmitoyl-2-arachidonoyl-phosphatidylcholine, oxpapc). oxpapc was shown to stimulate il-6 production from lung macrophages via the tlr4-trif-traf6 pathway in vitro, independently of myd88. these findings contrast sharply with signaling initiated by lps, which occurs predominantly through tlr4-myd88. in the complex inflammatory response initiated by hcl in the lungs, one might expect that tlr4 would be activated by several different endogenous stimuli; however, mice lacking tlr4, trif, or traf6 all resisted hclas well as oxpapc-induced inflammation, supporting a role for oxpapc as an important stimulus of tlr4 activation in this model. because patients infected with the influenza virus or sars-coronavirus often develop severe lung injury, the authors looked for oxpl in the lungs of mice infected with an inactivated h5n1 avian influenza virus. as in the hcl injury model, immunohistochemical analysis identified oxpapc in the lungs, but mice lacking tlr4 or trif had lung inflammation that was much less severe. mice lacking the ncf1 protein, which lack an active nadph oxidase complex, were protected from viral lung inflammation and did not form oxpapc in the airspaces, further supporting a key role for oxidation of phospholipids in the pathogenic pathway. high levels of oxpapc were also detectable in the lungs of animals with experimental pulmonary infections due to bacillus anthracis or yersinia pestis, suggesting that oxpl-mediated lung injury is of more general significance. the relevance of this new mechanism for human lung injury was demonstrated by the observation that significant amounts of oxpapc were present in lung tissue samples from two patients with lethal h5n1 avian influenza infection and nine patients with ali following sars-coronavirus infection. these experimental results have surprising implications for understanding the pathogenesis of ali. a central role of tlr4 in lung injury has been suspected because lps is present in the lungs of many patients with ali, whether or not overt bacterial infection is present (martin et al., 1997) . in addition, tlr4 is triggered by a number of different endogenous alarmins likely to be present in injured lungs (oppenheim et al., 2007) . similarly, the formation of oxidized phospholipids in the lungs is not surprising given the intensely oxidative, neutrophil-rich environment in the lungs of patients with ali (sittipunt et al., 2001) . however, the central role of tlr4 in acid-induced injury, the principal role of oxpl in triggering tlr4 signaling, the predominant role of the trif-traf6 signaling pathway (figure 1) , and the general applicability of the findings to important respiratory viral infections are all unexpected. paradoxically, the intravenous administration of oxpapc protects mice from lps-induced lung injury by protecting the work of imai et al. (2008) provides evidence that acute lung injury involves oxidized phospholipids acting through toll-like receptor 4 (tlr4). in this model, injury to the lungs through acid aspiration or viral infection leads to activation of nadph oxidase (nadph ox) and production of reactive oxygen species (ros), which oxidize 1-palmitoyl-2-arachidonoyl-phosphatidylcholine (papc, ox-papc). oxpapc activates tlr4 expressed by myeloid cells (an alveolar macrophage is shown), and the intracellular signal is transduced by the adaptor proteins trif and traf6, leading to interleukin 6 (il-6) production, inflammation, and alveolar damage. pmn, polymorphonuclear leukocyte. the endothelial barrier, suggesting that oxpapc might have different effects in the alveolar and intravascular compartments (nonas et al., 2006) . these seemingly discrepant findings could be reconciled in part if systemic challenge with oxpapc directly (via tlr4) or indirectly desensitizes the activation of circulating leukocytes. given that the myd88 pathway is critical to the host response to bacterial infections (skerrett et al., 2007) , the results of imai and colleagues suggest that new strategies to modulate the trif-traf6 pathway, while leaving the myd88 pathway largely intact, might be beneficial in some forms of ali. although the proximal event that creates the initial oxidative environment in the lungs remains unclear, neutrophil recruitment and activation are likely to be important because of the neutrophil's potent respiratory burst and because of the protection noted in ncf1-deficient mice. likewise, the key molecular "switch" that controls whether trif or myd88 is activated by tlr4 remains a key unanswered question. almost 41 years after the clinical description of ali, we have only one treatment that definitely improves survival, and this involves reducing the volume of air applied to the lungs during mechanical ventilation (acute respiratory distress syndrome network, 2000). the work of imai and colleagues points to potential molecular approaches that could further improve outcomes for this clinically important syndrome. courtship in the fruit fly drosophila melanogaster is largely the domain of the male and consists of a series of intricate behaviors designed to achieve successful copulation. these behaviors include following, tapping and licking the female, and the extension of the male wing that is closest to the female and its vibration to generate male courtship song (reviewed by billeter et al., 2006a) . these behaviors depend on complex sensory and motor neural circuitry acting on specific effector tissues such as the limbs, wings, proboscis, and abdominal muscles of the male. the action of the neurons involved in these circuits can be related directly to the behavior they modulate such as courtship song production, which is crit-ical for copulatory success. this behavioral output is robust and easily quantified, and so lends itself to structure/ function analyses. the ability to perform these sexspecific behaviors is dependent on the existence of a sexually dimorphic nervous system. differences, both in neuronal numbers and projection patterns, acute respiratory distress syndrome network key: cord-321308-rwxhdg8r authors: grubaugh, nathan d.; hanage, william p.; rasmussen, angela l. title: making sense of mutation: what d614g means for the covid-19 pandemic remains unclear date: 2020-07-03 journal: cell doi: 10.1016/j.cell.2020.06.040 sha: doc_id: 321308 cord_uid: rwxhdg8r abstract korber et al. (2020) found that a sars-cov-2 variant in the spike protein, d614g, rapidly became dominant around the world. while clinical and in vitro data suggest that d614g changes the virus phenotype, the impact of the mutation on transmission, disease, and vaccine and therapeutic development are largely unknown. following the emergence of sars-cov-2 in china in late 2019, and the rapid expansion of the covid-19 pandemic in 2020, questions about viral evolution have come tumbling after. did sars-cov-2 evolve to become better adapted to humans? more infectious or transmissible? more deadly? virus mutations can rise in frequency due to natural selection, random genetic drift, or features of recent epidemiology. as these forces can work in tandem, it's often hard to differentiate when a virus mutation becomes common through fitness or by chance. it is even harder to determine if a single mutation will change the outcome of an infection, or a pandemic. the new study by korber et al. (2020) sits at the heart of this debate. they present compelling data that an amino acid change in the virus' spike protein, d614g, emerged early during the pandemic, and viruses containing g614 are now dominant in many places around the world. the crucial questions are whether this is the result of natural selection, and what it means for the covid-19 pandemic. for viruses like sars-cov-2 transmission really is everything -if they don't get into another host their lineage ends. korber et al. (2020) hypothesized that the rapid spread of g614 was because it is more infectious than d614. in support of their hypothesis, the authors provided evidence that clinical samples from g614 infections have a higher levels of viral rna, and produced higher titers in pseudoviruses from in vitro experiments; results that now seem to be corroborated by others [e.g. (hu et al., 2020; lorenzo-redondo et al., 2020; ozono et al., 2020; wagner et al., 2020) ]. still, these data do not prove that g614 is more infectious or transmissible than viruses containing d614. and because of that, many questions remain on the potential impacts, if any, that d614g has on the covid-19 pandemic. to answer this question, we must first explore how g614 became the dominant genotype, and what impacts it may have on transmission. as an alternative hypothesis to the one described above, the increase in the frequency of g614 may be explained by chance, and the epidemiology of the pandemic. in february, the area with the most covid-19 cases shifted from china to europe, and then in march on to the us. as this and other work shows, the great majority of sars-cov-2 lineages in the us arrived from europe, which is unsurprising considering the amounts of travel between the continents. whether lineages become established in a region is a function not only of transmission, but also the number of times they are introduced. there is good evidence that for sars-cov-2, a minority of infections are responsible for the majority of transmission (endo et al., 2020) . therefore, while most introductions go extinct, those that make it, make it big (lloyd-smith et al., 2005) . over the period that g614 became the global majority variant, the number of introductions from china where d614 was still dominant were declining, while those from europe climbed. this alone might explain the apparent success of g614. even if viruses containing g614 got "lucky" in escaping china, the variant may still provide a transmission boost. the clinical and in vitro data provided by korber et al. (2020) certainly make this a plausible scenario. however, higher detection of sars-cov-2 rna in oral and nasal swabs may not be a direct reflection of transmission potential. in addition, much transmission likely happens in the presymptomatic stage, and we don't know how these differences during the symptomatic phase compare. the pseudovirus assays used in this study can demonstrate the ability to infect a cell in culture and the results are important, but it's not clear what it means for the ability to productively transmit to a new host. these assays don't account for the effect of other viral or host proteins, and the parade of biochemical host-pathogen interactions that must occur to support infection and transmission. therefore, as prior experience with the 2013-2016 ebola epidemic suggests (marzi et al., 2018) , it's impossible to conclude that a single mutation alone would have a major impact in a large, diverse human population based on in vitro infectivity and fitness data. if g614 truly is more transmissible in equivalently mixing populations, then yes, the virus will be harder to control. but we cannot definitively answer this question at the moment. so far there is no evidence that infection with sars-cov-2 containing the g614 variant will lead to more severe disease. by examining clinical data from 999 covid-19 cases diagnosed in the united kingdom, korber et al. (2020) found that patients infected with viruses containing g614 had higher levels of virus rna, but not did not find a difference in hospitalization outcomes. these clinical observations are supported by two independent studies: 175 covid-19 patients from seattle, wa (wagner et al., 2020) and 88 covid-19 patients from chicago, il (lorenzo-redondo et al., 2020) . viral load and disease severity are not always correlated, particularly when viral rna is used to estimate virus titer. the current evidence suggests that d614g is less important for covid-19 than other risk factors, such as age or comorbidities. while the d614g mutation is located in the virus' external spike protein that receives a lot of attention from the human immune system, and thus could have an influence on the ability of sars-cov-2 to evade vaccine-induced immunity, we think that it's unlikely for these reasons. d614g is not in the receptor-binding domain (rbd) of the spike protein, but the interface between the individual spike protomers that stabilize its mature trimeric form on the virion surface through hydrogen bonding. korber et al. (2020) propose that this may result in the loss of between-protomer hydrogen bonds, modulate interactions between spike protomers, or change glycosylation patterns. while any of these changes could alter infectivity, it is less likely that it would drastically alter the immunogenicity of rbd epitopes thought to be important for antibody neutralization. furthermore, korber et al. (2020) and others (hu et al., 2020; ozono et al., 2020) found that the antibodies generated from natural infection with viruses containing d614 or g614 could cross neutralize, suggesting that the locus is not critical for antibodymediated immunity. the d614g mutation is therefore unlikely to have a major impact on the efficacy of vaccines currently in the pipeline, some of which exclusively target the rbd. because the specific effect of d614g on spike function in entry and fusion is unknown, the impact of this mutation on therapeutic entry inhibitors is unknown. there is no current evidence that it would interfere with therapeutic strategies such as monoclonal antibodies designed to disrupt spike binding with ace2 or drugs that modulate downstream processes such as endosomal acidification. however, until we better understand the role of d614g during natural sars-cov-2 infection, the mutation should be taken into consideration for any vaccine or therapeutic design. while there has already been much breathless commentary on what this mutation means for the covid-19 pandemic, the global expansion of g614 whether through natural selection or chance means that this variant now is the pandemic. as a result its properties matter. it is clear from the in vitro and clinical data that g614 has a distinct phenotype, but whether this is the result of bonafide adaptation to human ace2, whether it increases transmissibility, or will have a notable effect, is not clear. the work by korber et al. (2020) provides an early base for more extensive epidemiological, in vivo experimental, and diverse clinical investigations to fill in the many critical gaps in how d614g impacts the pandemic. estimating the overdispersion in covid-19 transmission using outbreak sizes outside china the d614g mutation of sars-cov-2 spike protein enhances viral infectivity and decreases neutralization sensitivity to individual convalescent sera tracking changes in sars-cov-2 spike: evidence that d614g increases infectivity of the covid-19 virus superspreading and the effect of individual variation on disease emergence a unique clade of sars-cov-2 viruses is associated with lower viral loads in patient upper airways recently identified mutations in the ebola virus-makona genome do not alter pathogenicity in animal models naturally mutated spike proteins of sars-cov-2 variants show differential levels of cell entry comparing viral load and clinical outcomes in washington state across d614g mutation in spike protein of sars-cov-2 key: cord-252433-0e9lonq4 authors: cullen, bryan r. title: viral rnas: lessons from the enemy date: 2009-02-20 journal: cell doi: 10.1016/j.cell.2009.01.048 sha: doc_id: 252433 cord_uid: 0e9lonq4 viruses are adept at evolving or co-opting genomic elements that allow them to maximize their replication potential in the infected host. this evolutionary plasticity makes viruses an invaluable system to identify new mechanisms used not only by viruses but also by vertebrate cells to modulate gene expression. here, i discuss the identification and characterization of viral mrna structures and noncoding rnas that have led to important insights into the molecular mechanisms of eukaryotic cells. viruses are adept at evolving or co-opting genomic elements that allow them to maximize their replication potential in the infected host. this evolutionary plasticity makes viruses an invaluable system to identify new mechanisms used not only by viruses but also by vertebrate cells to modulate gene expression. here, i discuss the identification and characterization of viral mrna structures and noncoding rnas that have led to important insights into the molecular mechanisms of eukaryotic cells. the hiv-1 tat protein activates transcription of the hiv-1 provirus by recruiting the cellular p-tefb complex to the viral tar rna hairpin. nuclear export of incompletely spliced hiv-1 transcripts is facilitated by the rre rna structure, which recruits a complex consisting of the viral rev protein and cellular crm1. a similar function is performed by the cte rna structure present in mpmv, which recruits the cellular tap nuclear export factor. the cytoplasmic translation of picornaviral mrnas is facilitated by ires elements, and the translation of retroviral and coronaviral mrnas is modulated by sequences that induce ribosomal frameshifting. finally, the translation of both viral and cellular mrnas can be specifically downregulated by virally encoded micrornas. to tar, p-tefb mediates the phosphorylation of negative regulators of transcription elongation and of the carboxy-terminal domain of rnap ii molecules that have initiated transcription of hiv-1 proviral dna. these phosphorylation events render rnap ii elongation competent and allow it to transcribe the entire viral genome (barboric and peterlin, 2005; kao et al., 1987) . in contrast, in the absence of tat or tar, transcription initiation at the ltr promoter still occurs but almost all of these initiating rnap ii molecules fall off the dna template within ~200 bp of the transcription start site. analysis of tat function led to the realization that not only transcription initiation but also transcription elongation can regulate gene expression levels in animal cells (barboric and peterlin, 2005) . almost all retroviruses contain a single rnap ii-dependent promoter element in the viral ltr that drives transcription of an initial genome-length rna that also acts as an mrna for translation of the viral gag and pol proteins (cullen, 2003) . in the case of hiv-1, this initial transcript can also be processed into fully spliced transcripts encoding the tat and rev proteins of hiv-1 as well as the auxiliary protein nef. alternatively, this transcript can be processed into partially spliced mrnas encoding the three other viral auxiliary proteins vif, vpu, and vpr. the hiv-1 replication cycle, therefore, requires that the single initial viral transcript is exported out of the nucleus in several differentially spliced forms. these include an unspliced form that programs gag and pol expression and that is packaged into virion progeny; partially spliced forms that program expression of env, vif, vpr, and vpu; and fully spliced forms that program expression of tat, rev, and nef (cullen, 2003) . the difficulty with this scenario is that eukaryotic cells do not normally permit the nuclear export of intron-containing mrnas. almost all cellular mrnas are transcribed as intron-containing pre-mrnas, and these introns are recognized in the nascent transcript by splicing factors, including commitment factors. commitment factors both commit the pre-mrna to the splicing pathway and retain the pre-mrna in the nucleus until all introns are removed (legrain and rosbash, 1989) . hiv-1 mrnas rely entirely on cellular factors for appropriate splicing, and intron-containing hiv-1 transcripts are therefore also retained in the infected cell nucleus by splicing commitment factors. the strategy that hiv-1 has evolved to circumvent this nuclear retention is dependent on the viral rev protein, which is translated from a fully spliced viral mrna that is constitutively exported from the nucleus. as a result, hiv-1 mutants lacking a functional rev gene are able to express the proteins encoded by fully spliced viral mrnas, that is, tat, nef, and the defective rev protein itself, but cannot express any of the proteins encoded by incompletely spliced viral mrnas, including gag, pol, and env. the transcripts encoding these viral structural proteins, however, can be detected in the nucleus of cells infected by rev-deficient viruses, where they are either degraded or eventually fully spliced and then exported (cullen, 2003) . rev function requires a highly structured rna target, located in the hiv-1 env gene, called the rev response element or rre (malim et al., 1989) . the rre contains a single, high-affinity rev-binding site and also functions as a scaffold for the multimerization of rev on viral mrnas. rev in turn interacts with a cellular factor called crm1 that belongs to the karyopherin family of nucleocytoplasmic transport proteins (fischer et al., 1995) . this interaction is mediated by a leucine-rich motif located toward the carboxyl terminus of rev that was the first nuclear export signal (nes) to be identified and is the prototype of the leucine-rich class of ness. karyopherin function is regulated by the action of a g protein called ran, which, like all g proteins, is active when bound by gtp and inactive when bound by gdp (kohler and hurt, 2007) . cells contain high levels of a ran-specific g nucleotide exchange factor (gef) in the nucleus and of a ran-specific gtpase activating protein (gap) in the cytoplasm. as a result, ran:gtp is largely nuclear and ran:gdp is mainly cytoplasmic. ran:gtp binds to crm1 in the nucleus and activates the binding of crm1 to leucine-rich ness. the ribonucleoprotein complex, consisting of ran:gtp, crm1, and rev, that forms on the hiv-1 rre directs incompletely spliced hiv-1 transcripts to the nuclear pore complex and then into the cytoplasm, where hydrolysis of the gtp moiety by cytoplasmic gap disassembles this complex. although rev was the first nuclear mrna export factor to be identified, it soon became clear that crm1 is not required for the nuclear export of most cellular mrnas. in fact, crm1 is involved largely in the nuclear export of small nuclear rnas (snrnas) and preribosomal subunits, as well as in protein nuclear export (kohler and hurt, 2007) . so which factors are required for the export of cellular mrnas? an important part of the answer emerged from analysis of a second retrovirus called mason-pfizer monkey virus (mpmv). mpmv has a simpler genomic organization than hiv-1 and only encodes the three structural proteins picornaviruses and some flaviviruses recruits cellular translation factors and ribosomal subunits to viral translation initiation codons in the absence of an mrna cap gag, pol, and env. nevertheless, mpmv expresses both a genome-length gag/ pol mrna and a spliced env mrna. as mpmv does not encode a rev homolog, how do the incompletely spliced genomic mpmv mrnas reach the cytoplasm? this question led to the discovery of an rna stem-loop structure within the mpmv genome, the constitutive transport element (cte), that mediates the nuclear export of incompletely spliced mrnas in the absence of any viral proteins (bray et al., 1994) . further analysis revealed that the cte recruits a heterodimer of two cellular proteins, tap and p15, that also plays a critical role in the nuclear export of the majority of cellular mrnas (grüter et al., 1988; kohler and hurt, 2007) . normally, the tap/p15 heterodimer is only recruited to mature, fully spliced mrnas. however, the cte is able to prematurely recruit tap/p15 to partially spliced mrnas and thereby circumvents the nuclear retention of incompletely spliced mpmv mrnas. although ctes have now been defined in several other exogenous and endogenous retroviruses, not all ctes act by directly recruiting tap/p15. in particular, the avian leukemia virus cte does not appear to bind to tap or p15 directly, although tap may be required for its function (leblanc et al., 2007) . further analysis may reveal new insights into how the export of retroviral nuclear mrnas is regulated. after an mrna is exported to the cytoplasm, it must recruit cellular ribosomes in order for the translation of the encoded open reading frame to occur (figure 1 ). picornaviruses presented two mysteries in terms of how these pathogenic viruses are able to translate the single large polyprotein encoded by their positive-sense rna genome. first, the single genome-length picornavirus mrna is uncapped. second, infection by picornaviruses such as poliovirus results in the efficient translation of viral mrnas, yet cellular mrna translation is largely blocked. so, why is this uncapped viral mrna translated more efficiently than capped cellular mrnas? the key discovery that led to the resolution of this conundrum was the identification of the poliovirus internal ribosome entry site (ires), a ~450 nucleotide (nt) highly structured rna element found in the 5′ untranslated region (5′utr) of poliovirus mrnas (jang et al., 1988; pelletier and sonenberg, 1988) . the ires directly recruits several eifs and the 40s ribosomal subunit to an internal viral translation initiation codon without the requirement for either cap binding or 5′utr scanning. as a result, poliovirus translation is independent of the host cell cap recognition factor eif4e. moreover, although poliovirus translation initiation does require eif4g, it functions perfectly well with the carboxy-terminal fragment of eif4g that is generated by the proteolytic cleavage of eif4g by a virus-encoded protease. because capdependent translation requires fulllength eif4g, this cleavage blocks host cell translation, whereas viral mrna translation is not only unimpeded but in fact is enhanced by the access of viral mrnas to the entire pool of available eifs and ribosomal subunits (martinez-salas et al., 2008) . subsequent work has demonstrated that all picornaviruses as well as some flaviviruses, including hepatitis c virus (hcv), contain ires elements. surprisingly, these exist in several functionally distinct classes. for example, the hcv ires, unlike the poliovirus ires, can directly recruit 40s ribosomal subunits to the viral internal translation initiation codon in the absence of eifs, although eifs do participate in the process of translation initiation (martinez-salas et al., 2008 ). an even more unusual ires is found in cricket paralysis virus (crpv), a picornavirus-like insect virus (jan et al., 2003; pestova and hellen, 2003) . the crpv ires not only is able to recruit both the 40s and 60s ribosomal subunits to assemble elongation-competent 80s ribosomes on viral mrnas but also acts as a mimic of met-trnai to permit initiation of the translation of viral capsid proteins in the absence of met-trnai. although ires elements were first discovered in rna viruses, a subset of cellular mrnas are now known to also contain iress. interestingly, iress seem to be especially prevalent in mrnas whose expression is activated by stress, when cap-dependent translation may be inefficient. many ires-containing host mrnas encode proteins that protect cells from stress, whereas the proteins encoded by other ires-containing cellular mrnas seem to be important during apoptosis (bushell et al., 2006; komar and hatzoglou, 2005) . another interesting translational phenomenon observed in several virus families, including many species of retroviruses and all coronaviruses, is programmed ribosomal frameshifting (brierley and dos ramos, 2006) . in retroviruses such as hiv-1, frameshifting prevents some ribosomes from terminating translation at the end of the open reading frame (orf) encoding the gag structural protein and instead induces ribosomes to shift into the overlapping pol orf, resulting in the production of the large gag-pol polyprotein. similarly, in coronaviruses, ribosomal frameshifting is used to produce the 1a/1b replicase polyprotein rather than the shorter 1a variant. frameshifting is induced by a bipartite element consisting of a 5′ frameshifting site and an adjacent 3′ rna structure (brierley and dos ramos, 2006; jacks et al., 1988) . the frameshift site has the consensus sequence x_xxy_yyz (where the translational phase is indicated), which then slips into the −1 frame, that is, xxx_yyy_z. the actual shift sites found in hiv-1 and the sars coronavirus are u_uuu_uua and u_uua_aac, respectively. the 3′ rna structure found in hiv-1 is thought to be a simple rna hairpin but other −1 frameshifting signals instead contain a pseudoknot 3′ to the frameshift signal (brierley and dos ramos, 2006) . it has been proposed that the function of the 3′ rna structure is to induce transient ribosomal pausing at the frameshift site to facilitate ribosomal slippage in the −1 direction. although frameshifting in hiv-1 occurs with an efficiency of ~5%, frameshifting efficiency in other viruses can be as high as ~25% and may be facilitated by a direct interaction between the paused ribosome and the downstream pseudoknot structure. programmed frameshifting is not unique to viruses but is found also in a small number of cellular genes in both eukaryotes and bacteria (shigemoto et al., 2001; tsuchihashi and kornberg, 1990) . viruses, rna interference, and micrornas rna interference (rnai) was first discovered by genetic analysis in nematodes (fire et al., 1998) ; however, it is likely that rnai first evolved as an innate immune response to viral infection. indeed, rnai continues to represent a key component of the antiviral response in plants and invertebrates (cullen, 2006) . the triggers for rnai in these species are the long double-stranded rnas (dsrnas) that form critical intermediates in the replication of all rna viruses except retroviruses. these dsrnas are bound by the rnase iii-related enzyme dicer, which progressively cleaves these dsrnas into ~22 bp rna duplexes containing terminal 2 nt 3′ overhangs (see review by r.w. carthew and e.j. sontheimer in this issue of cell). one strand of this duplex, called a small-interfering rna (sirna), is then incorporated into the rna-induced silencing complex (risc), where it acts as a guide for rna to target risc to complementary regions of viral genomic, anti-genomic, or mrna species. risc then cleaves these viral rnas, leading to their degradation. the first sirnas to be identified were in fact antiviral sirnas produced in tobacco cells infected by a pathogenic rna virus, potato virus x (hamilton and baulcombe, 1999) . rnai is a critical component of the antiviral immune response in plants and invertebrates, but emerging evidence indicates that rnai responses to viral infection are not induced in mammalian somatic cells (cullen, 2006) . instead, mammalian cells have evolved other innate responses that are induced by viral dsrnas, including the interferon response. because of the importance of rnai as an antiviral defense in plants and insects, many rna viruses that infect these species have evolved gene products that inhibit rnai and, hence, enhance virus replication. conversely, the absence of antiviral rnai responses in mammalian cells means that the rnai machinery in these cells generally remains active during viral infection (figure 1) . although the role of rnai as an antiviral response appears to have been lost in mammalian somatic cells, the residual rnai machinery still plays a very important role by mediating the function of cellular micrornas (mirnas). unlike sirnas, which are derived from long dsrnas (frequently of exogenous origin), mirnas are encoded within the cell's genome as part of one arm of an ~80 nt rna hairpin located in a larger rnap ii transcript called a primary mirna (bartel, 2004) . after excision, by the sequential action of the host cell rnase iii-related enzymes drosha and dicer, mirnas are loaded into risc and downregulate the expression of cellular mrnas. unlike viral mrna targets of viral sirnas, cellular mrnas are rarely fully complementary to cellular mirnas. as full complementarity is a prerequisite for efficient cleavage by risc, cellular mrnas are generally not subject to degradation by cellular mirnas. instead, cellular mirnas can induce the translational inhibition of cellular mrnas by binding to partially complementary target sites (bartel, 2004) . as most mammalian viruses do not seem to interfere with the loading or function of risc, mirnas remain active in infected cells, thus offering the possibility for viruses to use the cellular rnai machinery to regulate cellular or viral gene expression by programming risc with viral mirnas. analysis of a range of virally infected cells has revealed that several dna viruses, including herpesviruses, encode multiple distinct mirnas. of note, most viral mirnas appear to be processed by the same drosha and dicer dependent pathway used by the majority of cellular mirnas, although there are a few examples of viral mirnas that are transcribed by rna polymerase iii, not rnap ii, and then excised directly by dicer (gottwein and cullen, 2008) . similarly, riscs programmed by viral mirnas appear to function in the same way as riscs programmed by cellular mirnas. although beyond the scope of this article, it is interesting to note that several cellular and viral mrna targets of viral mirnas have now been defined (gottwein and cullen, 2008) . in general, it appears that viral mirnas either downregulate cellular or viral genes that increase the sensitivity of virally infected cells to host innate or adaptive immune responses or, in the case of herpesvirus mirnas, stabilize viral latency by downregulating the expression of the viral immediate early proteins, which favor entry into the lytic replication cycle. in addition to mirnas, a number of dna viruses also encode long noncoding rnas that play a role in regulating viral replication and pathogenesis (table 2 ; reviewed in sullivan and cullen, 2009; see review by c.p. ponting, p.l. oliver, and w. reik in this issue of cell). but how do viruses use noncoding rnas to promote their replication? one interesting noncoding rna is the latency associated the instability of the 6.3 kb lat rna appears to be due to the fact that it is processed into several viral mirnas that may play a key role in regulating hsv-1 latency (umbach et al., 2008) . the role of the stable 2 kb lat intron is less clear, but evidence has been presented arguing that the lat intron is exported out of the nucleus by crm1 and associates with cellular ribosomes, thus suggesting a role in modulating mrna translation in neurons latently infected with hsv-1 (atanasiu and fraser, 2007) . another interesting viral noncoding rna is the polyadenylated nuclear (pan) rna encoded by kaposi's sarcomaassociated herpesvirus (kshv). pan is an unspliced rnap ii transcript that is the most highly expressed viral rna during lytic kshv infection, comprising up to 80% of all viral rnas. remarkably, the function of this rna in the viral life cycle is still unclear. however, recent data demonstrate that pan contains a novel ~80 nt-long rna element that stabilizes pan rna in the infected cell nucleus. insertion of this viral rna element in cis also increases the nuclear abundance of cellular mrnas, such as β-globin mrnas, that are normally unstable when expressed in an intronless form (conrad et al., 2006) . it is unclear whether this element simply stabilizes pan rnas or whether it also acts in trans to stabilize other kshv-coding mrnas, which are also largely intronless. finally, several viral noncoding rnas seem to be inhibitors of cellular innate antiviral immune responses. for example, the β2.7 noncoding rna encoded by human cytomegalovirus (hcmv) binds to mitochondrial enzyme complex i of the host cell. this interaction stabilizes the production of atp in infected cells and also inhibits virally induced apoptosis (reeves et al., 2007) . another noncoding rna, the va1 rna expressed by adenovirus, also binds to a cellular factor to inhibit an antiviral response. in this case, the target is protein kinase r (pkr), a cellular protein that binds to the long dsrnas produced by adenoviruses and many other pathogenic viruses. binding of dsrna by pkr induces pkr dimerization and autophosphorylation as well as phosphorylation of cellular eif2α, which results in a global inhibition of translation in the infected host cell. va1, a highly structured ~160 nt-long noncoding rna, binds to pkr with high affinity and blocks pkr dimerization and activation. this prevents the inhibition of translation induced by adenovirus-derived dsrnas and allows virus replication to proceed unimpeded (mathews and shenk, 1991) . interestingly, both hcmv β2.7 and adenovirus va1, like kshv pan and hsv-1 lat, are also expressed at high levels in infected cells. β2.7 comprises up to 20% of all viral transcripts in hcmv-infected cells, and adenovirus va1 is expressed at an extraordinarily high number of copies (~10 8 ) per infected cell. presumably, these high expression levels facilitate the saturation of cellular binding sites for these rnas. efforts to understand the replication cycles of viruses are often motivated by the pathogenic potential of these intracellular parasites. however, such analyses have also led to several key insights into how not only infected cells but also uninfected cells regulate the expression of their genome. moreover, as noted in the brief discussion of viral noncoding rnas, our knowledge of how virally encoded transcripts work in the host cell remains far from complete. clearly, future research into virus replication will provide unexpected and exciting insights into the complex molecular machinery that makes human cells tick. proc. natl. acad. sci. usa 91 proc. natl. acad. sci. usa proc. natl. acad. sci. usa non-coding regulatory rnas of the dna tumor viruses proc. natl. acad. sci. usa proc. natl. acad. sci. usa key: cord-284609-1q75zw6b authors: king, andrew m.q.; mccahon, david; slade, william r.; newman, john w.i. title: recombination in rna date: 1982-07-31 journal: cell doi: 10.1016/0092-8674(82)90454-8 sha: doc_id: 284609 cord_uid: 1q75zw6b abstract the aphthovirus genome consists of a single molecule of single-stranded rna that encodes all the virus-induced proteins. we isolated recombinant aphthoviruses from cells simultaneously infected with temperature-sensitive mutants of two different subtype strains. analysis of the proteins induced by 16 independently generated recombinants revealed two types of protein pattern, which were consistent with single genetic crossovers on the 5′ side and 3′ side, respectively, of the central p34-coding region. recombinants invariably inherited all four coat proteins from the same parent, and novel recombinant proteins were not observed. rnaase t1 fingerprints of virus rna, prepared from representatives of each recombinant type, confirmed the approximate crossover sites that had been deduced from the inheritance of proteins. these fingerprints provide molecular evidence of recombination at the level of rna and demonstrate the potential of rna recombination for producing genetic diversity among picornaviruses. the aphthovirus genome consists of a single molecule of single-stranded rna that encodes all the virus-induced proteins. we isolated recombinant aphthoviruses from cells simultaneously infected with temperature-sensitive mutants of two different subtype strains. analysis of the proteins induced by 16 independently generated recombinants revealed two types of protein pattern, which were consistent with single genetic crossovers on the 5' side and 3' side, respectively, of the central p34-coding region. recombinants invariably inherited all four coat proteins from the same parent, and novel recombinant proteins were not observed. rnaase tl fingerprints of virus rna, prepared from representatives of each recombinant type, confirmed the approximate crossover sites that had been deduced from the inheritance of proteins. these fingerprints provide molecular evidence of recombination at the level of rna and demonstrate the potential of rna recombination for producing genetic diversity among picornaviruses. although there is plenty of opportunity within diploid cells for the exchange of genetic information between homologous rna sequences, it is difficult to test for low levels of recombination among normal messenger rnas. rna viruses, however, provide a sensitive system for detecting recombinant% in addition, recombination could have functional or evolutionary importance for rna viruses, particularly for those with unsegmented genomes. the possibility of genetic recombination in such viruses was first suggested many years ago by hirst (1962) and ledinko (1963) , who showed that infection of cells with a mixture of inhibitor-sensitive variants of poliovirus resulted in an enhanced yield of resistant progeny that were genetically stable. similar observations were made with aphthovirus (or foot-and-mouth disease virus) in this laboratory (pringle, 1965) . however, with the exception of retroviruses, which are presumed to recombine as a dna intermediate, recombination has not been demonstrated in any other family of unsegmented rna virus despite extensive investigation. there are two possible explanations for the apparent recombination seen in picornaviruses: either the enhancement in the yield of resistant progeny is due to recombination between mutant genomes or it is due to complementation between them that leads to an increase in the yield of wild-type revertants. although this question has not, until now, been resolved conclusively, several lines of evidence favor the existence of recombination. first, recombination frequencies are additive. linear genetic maps were constructed for both poliovirus (cooper, 1968; cooper, 1977) and aphthovirus (lake et al., 1975; mccahon et al., 1977) . biochemical mapping of mutations has recently given us an insight into the physical basis of the aphthovirus genetic map; loci within the left-hand half, in the middle, and at the right-hand end of the genetic map appear to be correlated with their respective physical locations near the 5' end (king and newman, 1980; king et al., 1980) , middle (saunders and king, 1982) and 3' end of the genome (lowe et al., 1981; for review, see mccahon, 1981) . second, we have obtained biochemical evidence of genetic recombination by crossing temperature-sensitive mutants of aphthovirus that possess second-site mutations affecting polypeptide charge. three different pairwise crosses produced fs+ viruses with polypeptide markers from both parents (king et al., 1982) . third, the phenomenon of enhancement is seen even when the parental viruses have defects in the same polypeptide. two aphthovirus mutants, ts22 and tsl15, have been shown, by a variety of chemical and enzymological methods, to encode a temperaturesensitive rna polymerase (lowe et al., 1981) . yet infection with a mixture of these mutants resulted in a significantly enhanced yield of ts+ virus (mccahon and slade, 1981) . such mutations would be unlikely to complement each other, since the polymerase of picornaviruses functions as a monomeric polypeptide (flanegan and baltimore, 1979) . all the work reviewed above was confined to crosses between mutants of the same strain. to assess the evolutionary implications of recombination between picornaviruses, it is important to find out whether genetic information can be exchanged between evolutionarily divergent strains of virus. recent improvements in the sensitivity of recombination assays (mccahon and slade, 1981) have opened the way to a study of recombination between viruses having distinct nucleotide sequences. this paper describes a cross between two subtype strains of aphthovirus that has provided the first biochemical demonstration of genetic recombination at the level of rna. two aphthovirus strains of serotype 0 were chosen for studies of genetic recombination: pacheco of subtype o,, and vl of subtype os. the former was chosen because most previous studies had involved crosses between mutants of this strain, and the latter because its rna was readily distinguishable from that of the former by rnaase tl fingerprinting, and the approximate genomic locations of the oligonucleotides were known. this paper describes a genetic cross between temperature-sensitive mutants of the two subtype strains. the 0, parent, ts33, had a temperature-sensitive mutation that was associated with an altered structural polypeptide vp2 (king et al., 19801 , encoded near the 5' end of the genome. for the other parent, a mutant of subtype os with a temperature-sensitive mutation at the 3' end of the genome was sought. several spontaneous temperature-sensitive mutants of subtype os were isolated and screened by electrofocusing polypeptides induced in virus-infected cells. one mutant, ts302, induced an altered nonstructural polypeptide, p56a, which is encoded at the 3' end of the genome. figure 1 illustrates the electrofocusing patterns produced by the polypeptides of the two wild-type strains and the parental mutants, ts33 and ts302. conditions of labeling and sample preparation were designed to give simple patterns of stable virus-specific polypeptides, for ease of comparison in one dimension. polypeptides were identified by two-dimensional electrophoresis as described below. as figure 1 shows, the 0, temperature-sensitive mutant differed from the 0, wild-type only in vp2, and in the precursor of vp2, p38, whereas the oa temperature-sensitive mutant differed from the os wild-type only in p56a. the latter alteration was shown to be due to the temperaturesensitive defect by examining the polypeptides of ts+ revertants of the o6 parent. each of six spontaneous revertants demonstrated covariation between the charge change in p56a and the temperature-sensitive defect (data not shown). significantly, none of these revertants showed any resemblance to the 0, polypeptide pattern. were isolated by using the sensitive infectious center method (mccahon and slade, 19811, and the results are shown in table 1 . in addition to the intersubtypic cross (cross 31, each parent was crossed with a temperature-sensitive mutant belonging to the same virus strain. in cross (11, ts33 was crossed with fs03, which has a genetic locus (mc-cahon et al., 1977) within the region representing the rna polymerase gene (lowe et al., 1981) . the genetic location of ts303, used in cross (21, was not known, but several ts+ revertants of ts303 were found to have an electrophoretically altered vp3, suggesting that this mutant had a temperature-sensitive defect of the coat proteins. thus, in all three crosses in table 1 , parent a was believed to carry a coat protein temperature-sensitive lesion, whereas parent b carried a temperature-sensitive lesion in p56a. in each cross, infection with a mixture of mutants greatly enhanced the proportion of cells yielding ts+ progeny over that observed from singly infected controls. however, the proportion was not as high when the parents belonged to different subtypes (cross 3) as when isogenic parents were used (cross 1 and 2). of proteins viruses that were ts+ were cloned at 41 "c and were characterized initially by electrofocusing virus-induced polypeptides. we have previously shown this technique to be extremely sensitive to variation even virus-infected baby hamster kidney cells were labeled with 35s-methionine for 30 min. followed by a 30 min chase with unlabeled methionine, and virus-specific polypeptides were precipitated with antiserum as described previously (king et al., 1981) . the origin of electrofocusing is on the left. among closely related aphthoviruses (king et al., 1981) . as figure 1 shows, large differences between the subtypes were seen in the isoelectric points of all six major polypeptides, and these differences were exploited as markers for studying recombination. since the 0, parent was defective in vp2 and the os parent in p56a, f.s+ recombinants would be expected to inherit an o6 vp2 and an 0, p56a. out of 22 ts+ progeny, 16 possessed these hybrid characteristics. since each clone was derived from a separate infected cell, these recombinant viruses were all products of independent genetic crossovers. the recombinants were of two types, termed ret 1 and ret 2, and an example of each is shown in figure 1 . two were of the ret 1 variety, having inherited all their structural polypeptides vp1 , vp2, vp3 and p38, from the o6 parent, and both nonstructural polypeptides, p34 and p56a, from the 0, parent. the other 14 recombinants were of the ret 2 type, in which all six polypeptides except p56a were inherited from the os parent. the remaining six ts+ progeny were indistinguishable from the o6 wild-type and were presumably revertants of the o6 parent, fs302. an alternative possibility is that these fs+ viruses were produced by pairs of genetic crossovers on either side of the os temperature-sensitive mutation. ret 1 was the most interesting type of presumptive recombinant, since it appeared to have been generated either by a genetic crossover in the middle of the genome or by a minimum of two separate mutations. a more complete analysis of the induced polypeptides of ret 1 was therefore undertaken by two-dimensional electrophoresis of whole cytoplasmic extracts prepared from pulse-labeled virus-infected cells, as shown in figure 2 . under the conditions used, most of the virus-induced polypeptides, precursors as well as products, were labeled. identification of the larger virus polypeptides in two-dimensional gels has been described by king et al. (1982) . the small polypeptides p12, p16, p20a and p20b were identified from their electrophoretic mobilities in the sds gel dimension. in both subtypes, p20a appeared as two spots differing in isoelectric point. the two subtypes were compared by running a mixture of polypeptide extracts (0, + os, figure 2 ). differences between them are seen as pairs of polypeptide spots. the results confirmed the differences seen earlier in one dimension ( figure 1 ) and revealed further subtype-specific differences in all the other virus-coded polypeptides except for p12, p20b and p88. the absence of a difference in p88, the precursor of the structural proteins, was presumably because differences among its constituent polypeptides cancelled each other out. to improve resolution of the smaller polypeptides, all six analyses of figure 2 were repeated with a higher concentration of polyacrylamide in the sds gel (not shown); these gels confirmed the conservation of p12 and p20b. the acidic polypeptide vp4 does not enter these electrofocusing gels, but the isoelectric point of this polypeptide is known to be highly conserved (king et al., 1981) . differences among the other polypeptides were predominantly in the electrofocusing (horizontal) dimension. thus the polypeptides of the two subtype strains differed greatly in isoelectric point but not in size. the subtype origins of ret 1 polypeptides were determined by running mixtures of ret 1 and parental extracts. as can be seen from the ret 1 + 0, mixture, polypeptides p16, p20a, vp1 , vp3 and p38 of the recombinant differed from their 0, counterparts in exactly the same way as the two parents differed from each other, showing that these five polypeptides (indicated by open arrows on the ret 1 gel) were inherited from the o6 parent. in contrast, the same mixture gave rise to only single polypeptide spots for the other nonconserved polypeptides, p34, p52, p56a, p72 and pl 00, showing that these were inherited from the 0, parent (solid arrows on the ret 1 gel). as expected, the situation was reversed in the ret 1 + os mixture-the two viruses were indistinguishable in the former group of polypeptides, but were different in the latter. mature aphthovirus polypeptides are generated by cleavage of precursor polypeptides, which are synthesized from a single messenger rna that has the same sense as genomic rna, as shown in figure 3 . the location of p12 and p20b on the biochemical map (a. m. q. king and k. saunders, unpublished work) will be described elsewhere; positions of the other polypeptides are as reported by sangar (1979) . figure 3 also summarizes our knowledge of the parental origins of the ret 1 polypeptides. it is clear that this recombinant virus inherited the polypeptides encoded in the 5' half of the genome from its subtype o6 parent, but inherited those in the 3' half of the genome from its 0, parent, consistent with a single genetic crossover in the middle of the genome. the inheritance of p16 and p20a is of interest, since they have no analog among other picornaviruses. our results, which show that these nonstructural polypeptides are linked genetically to the coat proteins, agree with a 5'-coding position determined biochemically (sangar et al., 1980) . of tl oligonucleotides the provisional identification of recombinants on the basis of induced polypeptides was confirmed by rna fingerprinting. virion rna from a representative of each of the recombinant types (ret 1 and ret 2) was compared with the parental rnas by two-dimensional electrophoresis of rnaase tl digests. a mixture of parental rnas, 0, + os, was also analyzed, and all five fingerprints are reproduced in figure 4a . the complexity of the mixture showed that there were many differences between the rnaase tl -resistant oligonucleotides of these two subtype strains. the diagrammatic version in figure 4b indicates 22 oligonucleotides unique to the 0, component of the mixture, and at least 19 oligonucleotides unique to the o6 component. the rna fingerprint of the o6 mutant in figure 4a is virtually identical to that of the wild-type os strain previously published by harris et al. (1980) ; all but one of the 19 o,-specific oligonucleotides correspond to spots on the published wildtype pattern, and these are identified in figure 48 with use of the same numbering system. (the exception is a spot, or group of spots, at the bottom left of the diagram that appears more complex than in the version previously published.) the fingerprint of ret 1' rna resembled neither parent, although all the ret 1 oligonucleotides were present in the 0, + os mixture. of the 22 o,-specific oligonucleotides, nine were represented in the ret 1 pattern, 11 were missing and the parental origins of the two largest oligonucleotides, poly(c) and poly(a), could not be assigned. similarly, of the 18 numbered o6 oligonucleotides, ten were present in the ret 1 rna and six were missing. thus the oligonucleotide composition shown in figure 48 confirmed that both subtype strains contributed genetic information to ret 1 in approximately equal amounts. ret 2 rna also contained a recombinant set of oligonucleotides, which included four from the 0, rna and 14 of the numbered oligonucleotides in o6 rna ( figure 4b ). this confirmed the conclusions of the polypeptide analysis that most of the genetic information of ret 2 was inherited from the os parent. the locations of the oligonucleotides in o6 rna, as reported by harris et al. (1980) , are shown in figure 48 , (bottom). although their method of mapping oligonucleotides was rather imprecise, the data of harris et al. confirm very clearly that the recombinants inherited the 5' end of os rna, but not the 3' end. thus all ten of the o6 oligonucleotides that were present in ret 1 rna are located in the 5' half of the genome, whereas the six missing oligonucleotides are heavily concentrated towards the 3' end. similarly, the only two oligonucleotides that were missing from ret 2 rna are located, as expected, at the extreme 3' end. when these oligonucleotide maps are aligned with the biochemical maps of the proteins ( figure 4b ), the two sets of data agree with each other within the limits of mapping precision. the inheritance of oligonucleotides by ret 1 and ret 2 indicates that each recombinant genome was generated by either a single genetic crossover or, conceivably, several crossovers close together. it will not be possible to decide between these alternatives until the oligonucleotides have been located more precisely in the parental nucleotide sequences. however, our results leave little room for doubt that recombination between rna molecules really can occur, for several reasons. first, the aphthovirus genome consists of a single molecule of single-stranded rna, and most of the fingerprints in this study were obtained from fulllength fractions of virus rna. second, the rna is replicated in the cytoplasm with no requirement for dna synthesis. third, the hybrid oligonucleotide patterns of ret 1 and ret 2 showed no evidence of virus mixtures. finally, the same genetic crossovers were inferred independently from the inheritance of viruscoded proteins and were as predicted from the locations of the parental temperature-sensitive lesions. by using the infectious center method, up to one third of cells infected with a mixture of temperaturesensitive viruses have yielded ts+ progeny (mccahon and slade, 1981) . since most ts+ progeny of the intersubtypic cross proved to be genuine recombinants, rna recombination appears to be a normal concomitant of picornavirus replication. it should be emphasized that this cross is far from unique. we are currently exploiting recombination between these two virus strains for the purpose of mapping guanidineresistance mutations. thus far, crosses between six different pairs of mutants have been carried out and each has yielded viruses with recombinant genomes (k. saunders, unpublished observations) . is the process limited to picornaviruses? there is little evidence of recombination among other rna viruses, although several examples of sequence rearrangements in rna have been discovered in recent years. these include both specific rearrangements, such as the splicing of nuclear rna and the deletion of genomic sequences from coronavirus messenger rnas (lai et al., 1981) , and apparently nonspecific rearrangements, as exemplified by the rna of defective interfering (di) particles. the latter can be extremely complex (lehtovaara et al., 1981) , and one di rna of influenza virus has even been found to consist of sequences derived from more than one genome segment (fields and winter, 1982) . although splicing involves the breakage and rejoining of an rna chain, explanations for the production of di rnas usually invoke a "promiscuous" polymerase, which jumps from one region of template to another without terminating replication (lazzarini et al., 1981) . in principle, either mechanism might be involved in genetic recombination. however, there is an essential difference between recombination as described here and any of the molecular rearrangements referred to above, in that production of a functional recombinant requires some mechanism for bringing together separate rna molecules in a precise manner. the appropriate juxtaposition of homologous sequences would occur whenever rna molecules from two different viruses become hybridized to a single complementary strand. this situation could arise in various ways, for example, by a negative strand of rna hybridizing to virion rnas derived from both parents, or, alternatively, by hybridizing to a nascent positive strand in the replicative intermediate. if splicing enzymes could act on such structures, the former would give rise to a recombinant positive strand, whereas the latter would give rise to a recombinant negative (template) strand. however, all models of this type require the existence of free single-stranded rna of negative sense, for which there is as yet no evidence in picornaviruses. the existence of a genetic recombination map implies that crossovers can occur at many different places in the aphthovirus genome; indeed, the assumption behind recombination mapping is that crossovers are randomly located. in contrast, only a limited variety of recombinants was produced by crossing two different subtype strains, despite the fact that the parental temperature-sensitive mutations were at opposite ends of the genome. this suggests that recombination between subtypes is restricted to limited regions of the genome, a limitation reflected in the much lower recombination frequency from this cross compared with the result obtained with isogenic parents. more work will be needed to define the nature of these constraints, although it may be significant that the only crossovers that did occur were in regions coding for the conserved polypeptides, p12 and p20b. the fact that all the polypeptides of the 16 recombinants resembled one or other of the parental electrophoretic types suggests that there may be functional constraints on the generation of recombinant polypeptides with novel biochemical properties. what purpose does rna recombination serve? one possibility is that recombination acts as an rna repair mechanism by reconstructing complete genomes from overlapping fragments. the rescue of genetic markers from ultraviolet-inactivated viruses (pringle, 1965 ) and the surprisingly low dependence of recombination frequency on multiplicity of infection (mccahon and slade, 1981) give some credence to this idea. our finding that genetic information can be exchanged between different virus subtypes raises a second possibility-that recombination is a natural source of genetic diversity among aphthoviruses. thus recombination could play the same evolutionary role for unsegmented rna viruses that gene reassortment appears to for segmented viruses such as myxoviruses (scholtissek et al., 1978) and reoviruses (sugiyama et al., 1981) . the origin of the wild-type aphthovirus strain, 0, pachaco. and its chemically induced mutant, ts33. have been described by lake et al. (1975) . the wild-type strain oevl was provided by d. j. rowlands and was cloned at 41 "c before use. the other temperature-sensitive mutants were spontaneous and were isolated by the method of mccahon et al. (1977) . electrofocusing was performed in gels containing ph 3.5-10 ampholine ampholytes as described by king et al. (1982) . rna fingerprints 3zp-labeled virus was grown in baby hamster kidney monolayers in phosphate-free earle's saline containing 0.01 m tris-hci (ph 7.6), 1 as/ml actinomycin d and 200 &i/ml "p-orthophosphate. virus was purified as described by king and newman (1980) . except that the sucrose gradient was in tne buffer (0.15 naci. 0.05 m tris-hci and 5 mm edta, ph 7.4) instead of phosphate buffer. the rna was extracted with a 1:l mixture of phenol and chloroform. for all the viruses except ret 2. full-length 35s rna was isolated by centrifugation on a 5%-25% sucrose gradient in tne buffer containing 0.1% sds. purified rna was digested with rnase tl and subjected to two-dimensional electrophoresis as described by harris et al. (1980) . we thank t. j. r. harris for his help with the rna fingerprinting, k. saunders for access to unpublished work and d. j. rowlands for helpful discussions. the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 18 u.s.c. section 1734 solely to indicate this fact. received january 11, 1982; revised february 26, 1982 a genetic map of poliovirus temperature-sensitive mutants genetics of picornaviruses evidence of genetic recombination in foot-andmouth disease virus the replication of picornaviruses location of the initiation site for protein synthesis on foot-and-mouth disease virus rna by in vitro translation of defined fragments of the rna guanidine-resistant mutants of aphthovirus induce the synthesis of an altered non-structural polypeptide p34 on the origin of the human influenza virus subtypes h2n2 and h3n2 plenum press), pp. 133-207. fields, s. and winter, g. (1982) . nucleotide sequences of influenza virus segments 1 and 3 reveal mosaic structure of a small viral rna segment. cell 28, 303-313. flanegan, j. b. and baltimore, d. (1979) key: cord-258372-w0j0n8mn authors: gibson, erin m.; bennett, f. chris; gillespie, shawn m.; güler, ali deniz; gutmann, david h.; halpern, casey h.; kucenas, sarah c.; kushida, clete a.; lemieux, mackenzie; liddelow, shane; macauley, shannon l.; li, qingyun; quinn, matthew a.; roberts, laura weiss; saligrama, naresha; taylor, kathryn r.; venkatesh, humsa s.; yalçın, belgin; zuchero, j. bradley title: how support of early career researchers can reset science in the post-covid19 world date: 2020-06-12 journal: cell doi: 10.1016/j.cell.2020.05.045 sha: doc_id: 258372 cord_uid: w0j0n8mn the covid19 crisis has magnified the issues plaguing academic science, but it has also provided the scientific establishment with an unprecedented opportunity to reset. shoring up the foundation of academic science will require a concerted effort between funding agencies, universities, and the public to rethink how we support scientists, with a special emphasis on early career researchers. the covid19 crisis has magnified the issues plaguing academic science, but it has also provided the scientific establishment with an unprecedented opportunity to reset. shoring up the foundation of academic science will require a concerted effort between funding agencies, universities, and the public to rethink how we support scientists, with a special emphasis on early career researchers. the novel coronavirus, sars-cov-2, has placed science at the center of every conversation, amplifying the importance of scientific research to economic stability, healthcare infrastructure, and disaster preparedness. in academic science, recovery from the immediate covid19 crisis will require departments, universities, private foundations, federal agencies, and the public to work together collaboratively and comprehensively. the goal of recovery should not be to return to ''normal'' but, rather, to reset. here, we argue that recovery provides us with the opportunity to address three systemic issues that plague the conduct of research in the twenty-first century, with an emphasis on supporting early career researchers who are the most vulnerable. the strategies needed to ensure stability and success of early career scientists post-covid19 can be adapted to chip away at the systemic issues affecting the scientific establishment. science has changed immensely over the past 50 years. more has become better: more experiments per paper, more papers per year, more expectations and requirements for grants and tenure, more opinions from reviewers. the scientific community rewards quantity over quality. most scientists can easily name a seminal paper; many were published long before the 2000s, and many had, at most, a handful of figures. today, papers are often published with a plethora of supplemental figures that will largely go unread and underappreciated. the desire for ''more'' results in delays in publication, the awarding of grants, and career advancement for early career researchers; it also stymies creativity and encourages the proliferation of low-quality journals. this crisis is exacerbating the well-documented discrimination afflicting academic science (monroe et al., 2008) . women, parents, and individuals who identify as racial or ethnic minorities leave science, technology, engineering, and math (stem) fields as early career researchers at an excessively high rate in the best of times and will undoubtedly suffer more from the present lab closures. the responsibilities of family life disproportionately impact women. a parent who is trying to homeschool their children, manage household duties, and work will have left little time to further their own scientific agenda. faculty with family responsibilities-women specificallymust be supported. the covid19 crisis will only highlight the rampant diversity issues plaguing the scientific establishment, many of which begin with the loss of women and minorities during early career stages and may lead to further disenfranchisement of the disadvantaged (malisch et al., 2020) . the current model of academic science is heavily reliant upon federal funding, even though agencies such as the national institutes of health (nih) were not built to sustain such expectations. the federal government's funding capacity has significantly diminished as the cost of science has radically increased. the 2019 defense budget was $685 billion while the 2019 nih budget was $39 billion. the covid19 crisis has clearly amplified that the greatest risk to american life is not war, but disease. funding is needed at all levels; however, early career researchers should be particularly supported as the consistent trend of shifting funding away from younger researchers has no end in sight (daniels, 2015) . ensuring a durable future for academic science post-covid19 recovery from the immediate covid19 crisis necessitates a multi-pronged approach including fiscal and non-fiscal strategies to help graduate students, postdoctoral fellows, and early and later career faculty. this pandemic has particularly impacted senior postdoctoral fellows seeking academic faculty positions and early career faculty seeking to establish themselves as independent investigators. special consideration for these early career researchers is key to overcoming the crisis and strengthening the foundations of academic science. our action plan proposed below is not an exhaustive list of all possible recommendations for supporting scientists, nor is it inclusive of every academic scientist's specific circumstance. not all of our suggestions are applicable at every university or institution, as each will have its own unique set of challenges. we acknowledge that monetary support will be limited due to the deteriorating economic situation and drastic loss of revenue from clinical operations for most medical campuses. while the immediate goal of the recommendations is to provide support for scientists from funding agencies, universities, departments, and the public following covid19, this support also provides solutions to the three major challenges. solutions to these systemic issues (i.e., excess does not equal excellence, diversification leads to discovery, or funding agencies) are interwoven across the structure of academic science, allowing us to comprehensively tackle these issues at all levels. plans for recovery from the covid19 pandemic must ensure as much continuity as possible in research while improving upon existing infrastructures in order to provide a more inclusive, cohesive, and efficient future for the next generation of independent scientists. the resiliency of research is dependent upon the support of funding agencies. like the broader scientific community, funding agencies will need to adapt their strategies and structure to fit the changing times. simplification of grant application processes, including fewer supplemental documentations and more implementation of letter-of-intent formats prior to full proposals, could increase efficiency for both the funding agency and researcher. lab closures will undoubtedly create a void in the preliminary data that are necessary to obtain most awards. early career researchers who had less time to acquire these data prior to lab shutdowns will be the most affected. funding agencies could introduce policies and programs targeted at early investigators that require fewer preliminary data (similar to the national institute of mental health [nimh] brain research through advancing innovative neurotechnologies [brain] initiative r01 or the dp2), reducing the excess in data required for most grants. grants submitted by graduate students, postdoctoral fellows, and early career faculty who do not have sufficient preliminary data per current standards should be given special consideration. currently, many of the new funding opportunities by funding agencies, such as the nih, are geared toward supplements to existing grants or covid-related research. as there will likely be restrictions or reductions to new funding opportunities in the coming years due to fiscal shortages, faculty with existing grants might help early career faculty by including them in their supplemental applications. including early career faculty will also foster collaboration and resource sharing, both of which will be vital during this time (excess does not equal excellence and rethink the fundamentals of funding). extension of deadlines, timelines, and funding numerous funding agencies have already implemented deadline extensions, but deadlines must be further extended for the duration of lab disruptions. it is also imperative that funding agencies extend early investigator status for grant applications and implement no-cost extensions for currently held grants. additional bridge funding programs may be especially important for faculty who are between projects or aiming to switch areas of study following the covid19 crisis. extensions for tenure: faculty most universities have added one-year extensions to the tenure tracks of early career researchers, but sliding extensions may better support the success of vulnerable academics. many early career investigators may request extensions during lab closures, but they should also have the ability to go up for tenure early if the opportunity arises. ensuring the promotion and advancement of marginalized groups such as women, who make up < 30% of stem faculty, is even more imperative post-covid19. covid19-initiated resetting of expectations for the publishing, teaching, mentorship, and service requirements for tenure may not only help minimize the excesses innate to the current tenure structure, but also may help foster environments that can acknowledge implicit biases and keep marginalized groups from disproportionately leaving stem fields. tenure expectations for the next generation of early career researchers may need to account for increased variability between faculty that is exacerbated by the covid19 crisis and allow for more flexibility in the process. this crisis has amplified how the antiquated one-size-fits-all guidelines only encourage the disenfranchisement of women and racial or ethnic minorities (diversification leads to discovery and excess does not equal excellence). the current crisis will have a dramatic trickle-down effect, and numerous hiring freezes are already in place. mechanisms to allow postdoctoral fellows or graduate students in their final year to continue in their current positions should be enacted, if necessary, and if labs or universities are able to provide fiscal support. current closures are also disrupting the ability of many graduate students to complete their rotations. universities could extend the timeline for rotations and potentially cover graduate students' stipends. trainees, particularly postdoctoral fellows, may ll 2 cell 181, june 25, 2020 please cite this article in press as: gibson et al., how support of early career researchers can reset science in the post-covid19 world, cell (2020), https://doi.org/10. 1016/j.cell.2020.05.045 have limited ability to extend their period of training due to visa restrictions. universities should coordinate with federal agencies to pursue strategies aimed at extending visa expiration timelines, allowing trainees to complete work that was delayed due to the covid19 crisis. these mechanisms are needed to assure that we do not lose an entire generation of scientists following the coronavirus crisis. curtailment of applicable hiring freezes many universities have implemented hiring freezes for faculty and staff for the remainder of the year or beyond. universities should not limit the ability of early career faculty to hire postdoctoral fellows and staff, however. restricting early career faculty from hiring technical assistance and lab managers will stymie their ability to generate preliminary data, which will consequently limit grant and paper submissions and delay career advancement. even a short hiring freeze could have devastating effects on the ability of early career faculty labs to succeed. allowing early career faculty to continue hiring will also help to ease the bottleneck of graduate students looking for postdoctoral or research scientist positions within the next few years. hiring freezes at any level will disproportionately affect early career individuals and oversaturate the market with qualified candidates. permitting ongoing interviews for faculty positions, even if the official hire date is postponed, could alleviate stress on the postdoc population and expedite the hiring process when hiring freezes are lifted. the faculty search process serves as a valuable feedback mechanism for postdoctoral fellows that sometimes has an impact on career path. halting all hiring and all faculty searches may drive talented postdocs, especially women and members of ethnic or racial minorities, out of academia (diversification leads to discovery). although universities may curtail spending from institutional funds, special consideration should be given to new and early career faculty. early career faculty must retain access to their startup packages during this time. institutional funds should be released for salary support for early career faculty and for all staff, students, and trainees in their labs. if startup funds are set to expire, the expiration date should be extended. new faculty should be given the funds needed to establish their labs once research activities resume (rethink the fundamentals of funding). the economic toll caused by shelter-inplace will undoubtedly be significant, including the reduction in funding through endowments and charitable giving. we fully acknowledge that monetary supplementation may be difficult for universities following the covid19 crisis. any combination of fiscal supplementation with other mechanisms of non-fiscal support should be considered. universities might implement new or expanded fellowships for postdocs and graduate students, add to existing startup packages for faculty, assist with the purchasing of equipment or expand shared equipment funding, or create subsidies or joint ventures with federal programs similar to unemployment or re-deployment programs. universities might supplement pay or provide reimbursement for staff, postdoc, and graduate student salaries during the duration of academic closures. many universities have per diem policies that differ based on funding source, with reduced per diem costs associated with federal grants. early career faculty without federal funding have per diem costs double that of other labs. universities could implement mechanisms to reduce or supplement animal costs that will be accrued during lab closures and when labs reopen and expand their animal colonies (rethink the fundamentals of funding). onsite daycare facilities support postdoctoral fellows and faculty with young children. these family care centers are critical to narrowing the gap and slow the attrition of women and parents in science. universities could work with early childhood education programs to establish or expand daycare and preschool programs, providing free or subsidized childcare for faculty and teaching opportunities for early education majors. universities might also reach out to current or retired teachers seeking supplemental income (diversification leads to discovery). universities should encourage and enable graduate students and postdocs to use this time to learn new computational skills in anticipation of reductions in ability to do work at the bench. many universityoffered computational courses were over-committed during lab closures due to a significant increase in enrollment requests. universities should make a concerted effort to increase bandwidth and capacity for computational courses. many free online resources are also available to supplement the acquisition of coding skills. administrative and teaching expectations should be reevaluated during university closures. departments should reassess administrative and teaching loads, especially for early career faculty whose promotions are contingent upon teaching requirements. this is especially important, since female scientists generally have increased teaching loads and more advisory expectations than male scientists (gibney, 2017) , which could disproportionately delay scientific recovery of female scientists from covid19 closures (diversification leads to discovery). the covid19 crisis and subsequent lab closures will take an incredible toll on mental health. early career faculty who have yet to establish themselves or their research independently and postdocs whose future job prospects are now significantly limited will be especially impacted by prolonged lab shutdowns. department chairs, division leaders, and mentors should do their best to check in with early career faculty and postdocs during this time. mentoring will be key both during and after this crisis. establishing scheduled virtual meetings during social distancing and in-person meetings after labs are reopened could help alleviate some mental stress. university mental health resources are also available for anyone who needs support. as students generally contact female faculty about mental health issues more frequently than male faculty (bennett, 1982) , equal encouragement of mentorship from all faculty is essential to not overburdening women faculty during this time (diversification leads to discovery). mentoring graduate students throughout lab closures and after reopening should be strongly encouraged. those conducting experiments will be most affected by lab closures, and this should be explicitly acknowledged by faculty and mentors. universities must assure graduate students that graduate programs will be stabilized and that admittance will not be decreased. for many faculty, graduate students are the major workforce of the lab. to ensure that faculty can successfully build and sustain a lab, continued ability to attract graduate students is necessary. this is especially important for new investigators, as getting postdoctoral fellows can be more challenging for newer faculty. once labs are reopened, pairing early career faculty with a later career faculty mentor of an established lab could facilitate more effective research programs and allow for resource sharing. later career faculty could be incentivized to help early career researchers through reductions in teaching or administrative loads, supplementations to animal care costs, core facility usages, or other means of reimbursement and/or subsidizations. investment of later career faculty in the success of early career faculty will help to ensure stability and success in the younger generation of independent researchers. faculty who have clinical responsibilities also necessitate special consideration during this time, especially if they are on the front lines. these individuals will not only lose productivity due to lab closures and curtailment of patient enrollment in clinical trials, they will also have the extra physical and mental stressors of working in the hospital during a crisis. establishment of protocols to aid clinician-scientists is imperative to ensuring their important contributions to science. just as senior faculty mentoring will be critical for junior faculty and graduating postdocs to successfully transition to a post-covid era in the basic sciences, this type of mentorship protocol may be even more critical for clinician-scientists, many of whom do not have doctorates beyond the medical degree. make science a national priority the current crisis has brought the importance of science and research to the forefront of public life. not only is science critical for public health decision-making, but a sustained investment in research better positions political leaders to efficiently deploy testing and therapeutic solutions. capitalizing this momentum is crucial to engaging the public in science and science funding. providing additional funding sources focused on conveying science to the greater public and stimulating interest in science through educational outreach is critical. exploiting technology and social media to bring science and research directly to the public will be vital in the post-covid19 world. such technology might include mechanisms to allow private citizens to directly invest in science and scientists (else, 2019; miller, 2019) , including simplified website-based donation platforms or inclusion on election ballots. this is necessary for establishing new funding sources for scientists, potentially supplementing the dearth of funding for early career researchers at federal funding agencies (rethink the fundamentals of funding). the covid19 crisis has revealed a lack of public understanding about how science is funded, conducted, and reported. the current administration's belief that the nih is ''giving away $32 billion a year'' should be cause for concern (deyoung et al., 2020). much of the mistrust evident between the scientific establishment and the general population is rooted in lack of transparency and community figure 1 . the covid19 crisis has magnified the systemic issues plaguing academic research. these include the often stifling excess requirements in publication, tenure, and grant processes; the reliance on funding from national agencies that is catered towards senior level researchers; and the lack of diversity in academic research due to the attrition of women and racial or ethnic minorities during early career stages. ll involvement in science. taking scientists out of the ''ivory tower'' and increasing accessibility through technology may help to assuage the mistrust that hinders our preparedness in times of crisis. people cannot support what they do not understand. removing excess requirements in publishing, grantsmanship, and tenure expectations could have the added benefit of creating more time for scientists to interact in the public domain. scientists must work on building the trust that is imperative to success as a community, and early career scientists are primed to help pave this new future (excess does not equal excellence). beyond the immediate challenges of returning to laboratories and research careers, the covid19 crisis has exposed some of the underlying weaknesses and problems that permeate the current scientific enterprise (figure 1 ). for example, editors are asking reviewers to not request more experiments unless absolutely necessary to validate the core claims of a manuscript during the review process. most are applauding this effort to minimize excess and calling for its continued implementation even after scientists are able to get back to the bench. all institutions, funding agencies, departments, and members of the scientific community should speak openly and honestly about the difficulties faced during the current situation. early career researchers should be involved in the decision-making processes, as they represent the future of science and academic leadership. the covid19 crisis has provided us with the unique opportunity to reflect upon the present norms and enact change through fiscal and non-fiscal strategies. our hope is that this pandemic will allow us to chart a new course for science, both academically and socially, and to begin to address the core challenges of research, with a special focus on supporting the next generation of independent scientists. student perceptions of and expectations for male and female instructors: evidence relating to the question of gender bias in teaching evaluation a generation at risk: young investigators and the future of the biomedical workforce americans at world health organization transmitted real-time information about coronavirus to trump administration. the washington post, available from crowdfunding research flips science's traditional reward model teaching load could put female scientists at career disadvantage. nature in the wake of covid-19, academia needs new solutions to ensure gender equity the best platforms for crowdfunding science research. the balance: small business gender equality in academia: bad news from the trenches, and some possible solutions dr. roberts serves as editor-in-chief of books for the american psychiatric association publishing division and as editor-in-chief of the journal academic medicine. unrelated to this publication, dr. roberts serves as an advisor for the bucksbaum institute of the university of chicago pritzker school of medicine and owns the small business terra nova learning systems. key: cord-287349-1zcq7kzx authors: chen, james; malone, brandon; llewellyn, eliza; grasso, michael; shelton, patrick m.m.; olinares, paul dominic b.; maruthi, kashyap; eng, ed t.; vatandaslar, hasan; chait, brian t.; kapoor, tarun; darst, seth a.; campbell, elizabeth a. title: structural basis for helicase-polymerase coupling in the sars-cov-2 replication-transcription complex date: 2020-07-28 journal: cell doi: 10.1016/j.cell.2020.07.033 sha: doc_id: 287349 cord_uid: 1zcq7kzx summary sars-cov-2 is the causative agent of the 2019-2020 pandemic. the sars-cov-2 genome is replicated and transcribed by the rna-dependent rna polymerase holoenzyme (subunits nsp7/nsp82/nsp12) along with a cast of accessory factors. one of these factors is the nsp13 helicase. both the holo-rdrp and nsp13 are essential for viral replication and are targets for treating the disease covid-19. here we present cryo-electron microscopic structures of the sars-cov-2 holo-rdrp with an rna template-product in complex with two molecules of the nsp13 helicase. the nidovirus-order-specific n-terminal domains of each nsp13 interact with the n-terminal extension of each copy of nsp8. one nsp13 also contacts the nsp12-thumb. the structure places the nucleic acid-binding atpase domains of the helicase directly in front of the replicating-transcribing holo-rdrp, constraining models for nsp13 function. we also observe adp-mg2+ bound in the nsp12 n-terminal nidovirus rdrp-associated nucleotidyltransferase domain, detailing a new pocket for anti-viral therapeutic development. nsp13 is an sf1b helicase, which translocate on single-stranded nucleic acid in the 5 ->3' 206 direction (saikrishnan et al., 2009) . in vitro studies confirm this direction of translocation for the 207 nidovirus helicases (adedeji et al., 2012; bautista et al., 2002; seybert 208 et al., 2000a seybert 208 et al., , 2000b tanner et al., 2003) . unless the interaction of nsp13 with the holo-rdrp 209 alters the unwinding polarity, which seems unlikely, the structural arrangement observed in the 210 nsp13-rtc ( figure 2d) figure 5b ). in place of the bridge helix, the viral rdrp has 231 conserved motif f [sars-cov-2 nsp12 residues 544-555; (bruenn, 2003) ], which comprises a β-232 hairpin loop. motif f directs the t-rna to the top, while underneath motif f is a channel that 233 appears able to accommodate single-stranded nucleic acid. the analogous structural 234 arrangement leads us to propose that the sars-cov-2 rdrp may backtrack, generating a single-235 stranded rna segment at the 3'-end that would extrude out the rdrp secondary channel 236 table s1 ; video s1). in the structure, the primary 243 interaction determinant of the helicase with the rtc occurs between the nsp13-zbds and the 244 nsp8-extensions. both of these structural elements are unique to nidoviruses, and the 245 interaction interfaces are conserved within αand β-cov genera (figure 3 ), indicating that this 246 interaction represents a crucial facet of sars-cov-2 replication/transcription. a protein-protein 247 interaction analysis for the sars-cov-1 orfeome (which recapitulates the nsp13-rtc 248 interactions observed in our structure) identified nsp8 as a central hub for viral protein-protein 249 interactions (brunn et al., 2007) . the structural architecture of nsp8a and nsp8b, with their long 13 n-terminal helical extensions, provide a large binding surface for the association of an array of 251 replication/transcription factors ( figure 2c , e). 252 our structure reveals adp-mg 2+ occupying the niran domain active-site ( figure 4c) , 253 presumably because the sample was incubated with adp-alf 3 prior to grid preparation. the 254 adp makes no base-specific interactions with the protein; nsp12-niran-h75 forms a cation-255 π interaction with the adenine base ( figure 4c ), but this interaction is not expected to be 256 strongly base-specific, and structural modeling does not suggest obvious candidates for base-257 specific interactions. the position corresponding to h75 in the niran domain a n alignment is 258 not conserved ( figure 4a ), suggesting that; i) this residue is not a determinant of base-259 specificity for the niran domain active site, ii) that the niran domain base-specificity varies 260 among different nidoviruses, or iii) that niran domains in general do not show base-specificity 261 in their activity. the niran domain of the eav-rdrp appeared to prefer u or g for its activity 262 (lehmann et al., 2015a) . we note that the niran domain enzymatic activity is essential for viral 263 propagation but its target is unknown (lehmann et al., 2015a) . further experiments will be 264 required to understand more completely the niran domain activity, its preferred substrate, 265 and its in vivo targets, and these may vary among different nidoviruses. our results provide a 266 structural basis for i) biochemical, biophysical, and genetic experiments to investigate these 267 questions, and ii) a platform for anti-viral therapeutic development. 268 our analysis comparing the viral rdrp with cellular ddrps revealed a remarkable 269 structural similarity at the polymerase active sites -immediately downstream of each 270 polymerase active site is a conserved structural element that divides the active site cleft into 271 two compartments, directing the downstream nucleic acid template into one compartment and 272 in the cellular ddrps, the conserved structural element that divides the active site cleft is the 274 bridge helix (lane and darst, 2010) , and the secondary channel serves to allow ntp substrates 275 to access the ddrp active site and to also accommodate the single-stranded 3'-rna fragment 276 generated during backtracking ( figure 5a ). 277 in the viral rdrp, the downstream strand-separating structural element is the motif f β-278 hairpin loop. as for multisubunit ddrps, the rdrp secondary channel is perfectly positioned to 279 accommodate backtracked rna ( figure 5b ). based on this structural analogy, we propose that 280 the viral rdrp may undergo backtracking and that the single-stranded 3'-rna fragment so 281 generated would extrude out the viral rdrp secondary channel ( figure 5b ). we note that 282 backtracking of φ6 and poliovirus rdrps has been observed experimentally (dulin et al., 2015 (dulin et al., , 283 2017 . 284 ignoring sequence variation, the energetics of backtracking by the cellular ddrps are 285 close to neutral since the size of the melted transcription bubble and the length of the 286 rna/dna hybrid in the active site cleft are maintained (any base pairs disrupted by 287 backtracking are recovered somewhere else). for the sars-cov-2 rdrp, the arrangement of 288 single-stranded and duplex nucleic acids during replication/transcription in vivo is not known, 289 but in vitro the rdrp synthesizes p-rna from a single-stranded t-rna, resulting in a persistent 290 upstream p-rna/t-rna hybrid. in this case backtracking is energetically disfavored since it only 291 shortens the product rna duplex without recovering duplex nucleic acids somewhere else. 292 however, our structural analysis of the nsp13-rtc indicates that nsp13.1 can engage with the 293 downstream single-stranded t-rna ( figures 2d, s6c) . translocation of the helicase on this rna 294 strand would proceed in the 5'->3' direction, in opposition to the 3'->5' translocation of the 295 rdrp on the same rna strand. this aspect of helicase function could provide the ntp-296 dependent motor activity necessary to backtrack the rdrp. in cellular organisms, ddrp 297 backtracking plays important roles in many processes, including the control of pausing during 298 transcription elongation, termination, dna repair, and fidelity (nudler, 2012) . two potential 299 roles for backtracking in sars-cov-2 replication/transcription include: 1) fidelity and 300 2) template-switching during sub-genomic transcription. 301 backtracking by the cellular ddrps is favored when base pairing in the rna/dna hybrid 302 is weakened by a misincorporated nucleotide in the rna transcript (nudler et al., 1997) . the efficiency with which the holo-rdrp can negotiate downstream obstacles to 341 elongation is unknown. our structure suggests that the nsp13 helicase could act in the 5'->3' 342 direction on the t-rna to disrupt stable rna secondary structures or downstream rna binding 343 proteins ( figure 6b ), both of which could be significant impediments to rna elongation 344 ( figure 6b ). the helicase may function in this role distributively in order to avoid interfering 345 with rdrp translocation. alternatively, in the case of a fully duplex rna template, the helicase 346 could act processively to unwind the downstream duplex rna, much like replicative helicases, 347 such as dnab in escherichia coli, processively unwind the dna duplex in front of the replicative 348 dna polymerase (kaplan and o'donnell, 2002) . 349 finally, cov transcription includes a discontinuous step during the production of sub-350 genomic rnas (sg-transcription; figure 6c ) that involves a remarkable template-switching step 351 unique to nidoviruses (sawicki and sawicki, 1998) . the process produces sg-rnas that are 5'-352 and 3'-co-terminal with the virus genome. in this process, transcription initiates from the 3'-353 poly(a) tail of the +-strand rna genome [cyan rna in figure 6c the oligonucleotides used in this study are listed in table s2 . all constructs were verified by 704 sequencing (genewiz). 705 nsp7/8. the coding sequences of the e. coli codon-optimized sars-cov-2 nsp7 and nsp8 genes 706 (gblocks from integrated dna technologies) were cloned into a pcdfduet-1 vector (novagen). 707 nsp7 bore an n-terminal his 6 -tag that was cleavable with prescission protease (ge healthcare 708 life sciences). figure s3d ) for the nsp13 2 -rtc particles indicated that the map (and resolution 931 esimations) were corrupted by severe particle orientation bias. 932 nsp13-rtc (chapso). the entire dataset consisted of 4,358 motion-corrected images with 933 1,447,307 particles ( figure s4a ). particles were sorted using cryosparc 2d classification 934 (n=100), resulting in 344,953 curated particles. initial models (seed 1: complex, seed 2: decoy 935 1, seed 3: decoy 2) were generated using cryosparc ab initio reconstruction on a subset of the 936 particles (10,509 particles from first 903 images). particles were further curated using seeds 1-3 937 as 3d templates for cryosparc heterogeneous refinement (n=3), then re-extracted with a 938 boxsize of 320 px, and followed by another round of heterogeneous refinement (n=3) using 939 seed 1 as a template. the resulting 91,058 curated particles were sorted into three classes 940 using cryosparc heterogeneous refinement (n=3). each class was further sorted using 941 cryosparc ab initio reconstruction (n=3) to separate distinct 3d classes. using these classes as 942 references for heterogeneous refinement (n=6), multi-reference classification was performed 943 on the 91,058 curated particles. classification revealed three unique classes: (1) nsp13-rtc, 944 (2) nsp13 2 -rtc, (3) (nsp13 2 -rtc) 2 . particles within each class were further processed using 945 structural basis of transcription: rna polymerase backtracking and its 1021 phenix: a comprehensive python-based 1025 system for macromolecular structure solution mechanism of nucleic acid unwinding by sars-cov helicase coronavirus susceptibility to the antiviral remdesivir is mediated by the viral polymerase and the proofreading exoribonuclease transcription regulatory sequences and 1042 mrna expression levels in the coronavirus transmissible gastroenteritis virus functional properties of 1046 the predicted helicase of porcine reproductive and respiratory syndrome virus the global phosphorylation landscape of sars-cov-2 infection rna 3'-end 1061 mismatch excision by the severe acute respiratory syndrome coronavirus nonstructural protein 1062 nsp10/nsp14 exoribonuclease complex a structural and primary sequence comparison of the viral rna-dependent 1065 rna polymerases one number does not fit all: mapping local 1072 variations in resolution in cryo-em reconstructions aluminofluoride and beryllofluoride complexes: new phosphate analogs in 1075 enzymology eliminating effects of particle adsorption 1078 to the air/water interface in single-particle cryo-electron microscopy_ bacterial rna 1079 polymerase and chapso molprobity: all-atom structure validation for 1083 macromolecular crystallography aaa protein spastin using active site mutations structural basis for the regulatory function of a complex zinc-binding domain in a 1094 replicative arterivirus helicase resembling a nonsense-mediated mrna decay helicase coronaviruses: an rna proofreading machine regulates replication fidelity and diversity molecular dynamics simulatiosn related to sars-cov-2. d.e. shaw 1102 research technical data the predicted metal binding region of the arterivirus helicase protein is involved in subgenomic mrna synthesis genome replication, and virion biogenesis a novel protein kinase-1113 like domain in a selenoprotein, widespread in the tree of life backtracking behavior in viral rna-dependent rna polymerase provides the basis for a second 1117 initiation site signatures of nucleotide analog incorporation by an rna-dependent rna polymerase revealed using high-throughput magnetic tweezers. cell hepatitis virus replication is decreased in nsp14 exoribonuclease mutants infidelity of sars-cov nsp14-exonuclease mutant virus 1130 replication is revealed by complete genome sequencing coot: model-building tools for molecular graphics biochemical aspects of coronavirus 1136 replication and virus-host interaction promoting elongation with transcript cleavage stimulatory 1142 factors nidovirales: evolving the 1145 largest rna virus genome remdesivir is a direct-acting antiviral that inhibits rna-dependent rna polymerase 1149 from severe acute respiratory syndrome coronavirus 2 with high potency virus taxonomy. in family coronaviridae crystal structure of middle east respiratory syndrome coronavirus helicase from sars to mers: 10 years of research on highly 1162 pathogenic human coronaviruses structure 1165 of replicating sars-cov-2 polymerase dali server update human coronavirus 229e nonstructural protein 13: 1170 characterization of duplex-unwinding, nucleoside triphosphatase, and rna 5′-triphosphatase enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase delicate structural coordination of the severe acute respiratory syndrome coronavirus 1179 nsp13 upon atp hydrolysis structural basis of transcription arrest by coliphage hk022 nun in an 1183 dnab drives dna branch migration and dislodges 1186 structure of the sars-cov nsp12 polymerase bound 1192 to nsp7 and nsp8 co-factors transcriptional arrest: escherichia coli rna 1195 polymerase translocates backward, leaving the 3' end of the rna intact and extruded rna polymerase switches between inactivated and 1200 activated states by translocating back and forth along the dna and the rna molecular evolution of multisubunit rna polymerases: 1204 structural analysis cooperative 1207 translocation enhances the unwinding of duplex dna by sars coronavirus helicase nsp13 discovery of an essential nucleotidylating activity associated with a newly delineated conserved 1213 domain in the rna polymerase-containing protein of all nidoviruses what we know but 1217 do not understand about nidovirus helicases the embl-ebi search and sequence analysis tools apis in 1221 2019 bayesian deconvolution of mass and ion mobility spectra: from binary interactions to 1229 polydisperse ensembles discovery of an rna virus 3'->5' exoribonuclease that is critically involved in 1233 coronavirus rna synthesis collaboration gets the most out of software discovery of the first insect nidovirus, a missing evolutionary 1240 link in the emergence of the largest rna virus genomes cell the register of transcription by preventing backtracking of rna polymerase structure 1252 and function of the transcription elongation factor greb bound to bacterial rna polymerase sequence 1256 requirements for rna strand transfer during nidovirus discontinuous subgenomic rna 1257 synthesis analyzing resistance to 1264 design selective chemical inhibitors for aaa proteins cryosparc: algorithms for 1267 rapid unsupervised cryo-em structure determination high-throughput deconvolution of native mass spectra a 1274 planarian nidovirus expands the limits of rna genome size mechanistic basis of 1277 5'-3' translocation in sf1b helicases advances in experimental medicine and biology relion: implementation of a bayesian approach to cryo-em structure 1283 determination sequence logos: a new way to display consensus 1286 sequences biochemical characterization 1289 of the equine arteritis virus helicase suggests a close functional the human coronavirus 229e superfamily 1 helicase has rna and dna duplex-unwinding activities with 5′-to-3′ polarity remdesivir and sars-cov-2: structural requirements at 1302 both nsp12 rdrp and nsp14 exonuclease active-sites structure and mechanism of 1305 helicases and nucleic acid translocases coronaviruses lacking exoribonuclease 1308 activity are susceptible to lethal mutagenesis: evidence for proofreading and potential 1309 therapeutics thinking outside the triangle: replication coronavirus rna synthesis and processing continuous and discontinuous rna 1318 synthesis in coronaviruses protein ampylation by an evolutionarily conserved 1322 one severe acute respiratory syndrome 1326 coronavirus protein complex integrates processive rna polymerase and exonuclease activities the severe acute respiratory syndrome (sars) coronavirus belongs to a distinct class of 5′ to 3′ viral helicases transcriptional fidelity and proofreading by identification and characterization of a human coronavirus 229e nonstructural protein associated rna 3'-terminal adenylyltransferase activity structural basis of transcription: backtracked rna polymerase ii at 3.4 angstrom resolution structural basis for rna replication by the sars-cov-2 polymerase therapeutic efficacy of the small molecule gs-5734 against ebola virus in rhesus monkeys structural basis of transcription nonstructural proteins 7 and 8 of feline coronavirus form a 2:1 heterotrimer that 1371 exhibits primer-independent rna polymerase activity structural basis for inhibition of the rna-dependent rna polymerase from sars cov-2 by remdesivir insights into sars-cov 1378 transcription and replication from the structure of the nsp7-nsp8 hexadecamer crystal 1382 structure of thermus aquaticus core rna polymerase at 3.3 a resolution the nsp1, nsp13, and m proteins contribute to the hepatotropism of murine 1386 coronavirus jhm.wu motioncor2: anisotropic correction of beam-induced motion for improved cryo-electron 1390 microscopy a pneumonia outbreak associated with a new coronavirus of probable bat 1394 origin an insect nidovirus emerging from a primary tropical 1398 rainforest new tools for automated high-resolution cryo-em structure determination in 1402 relion-3 sequence motifs involved in the 1405 regulation of discontinuous coronavirus subgenomic rna synthesis rtc) with nsp13 helicases • the nsp13 ntpase domains sit in front of the rct, constraining functional models • nsp13 may drive rtc backtracking, thus impacting proofreading and templateswitching • structural analysis of adp-mg 2+ -bound niran domain, a potential antiviral target in brief chen et al. present cryo-em structures of the sars-cov-2 rna-dependent rna polymerase (rdrp) holoenzyme (nsp7/nsp8/nsp12) containing an rna template-product in complex with the viral helicase (nsp13). the work provides insight into the assembly and function of the multi-subunit protein machine and how we thank a. aher, j. berger, r. landick and c. rice for helpful key: cord-266480-u8o4eitu authors: colubri, andrés; kemball, molly; sani, kian; boehm, chloe; mutch-jones, karen; fry, ben; brown, todd; sabeti, pardis c. title: preventing outbreaks through interactive, experiential real-life simulations date: 2020-09-02 journal: cell doi: 10.1016/j.cell.2020.08.042 sha: doc_id: 266480 cord_uid: u8o4eitu operation outbreak (oo) is a simulation platform that teaches students how pathogens spread and the impact of interventions, thereby facilitating the safe re-opening of schools. in addition, oo generates data to inform epidemiological models and prevent future outbreaks. before sars-cov-2 was reported we repeatedly simulated a virus with similar features, correctly predicting many human behaviors later observed during the pandemic. introduction introduction introduction as countries shut down by sars-cov-2 reopen, decision-makers are debating how best to resume all levels of education to mitigate further spread of the virus (vogel and couzin-frankel 2020; edmunds 2020) . public health officials and school administrators have championed a wide range of interventions, including mask usage, social distancing, and small classes. the efficacy of these interventions depends on two key factors that are as yet unknown: (1) how likely each intervention is to modify behavior and transmission and (2) whether students and other stakeholders are educated, equipped, and empowered enough to remain compliant. here we present a new way to address both problems in an integrated manner. operation outbreak (oo) is an educational curriculum and simulation platform that uses bluetooth to spread a virtual "pathogen" in real-time across smartphones in close proximity. students engage with oo by first learning about key topics in outbreak prevention and response. they then participate in an appfacilitated outbreak simulation designed to vividly illustrate what they have learned. finally, we j o u r n a l p r e -p r o o f administer post-simulation reflection and analytical exercises to reinforce key points that can inform students' future responses to real outbreaks. oo simulations at sma. clockwise, from top left: "sick" student presenting to a health responder, epidemiologists analyzing outbreak data, members of the public health team bringing an "infected" student to the treatment center, "recovered" student showing her immune health status. b. components of the oo platform. transmission model defines the parameters dictating the probabilistic spread of the virtual virus among participant phones. during simulated outbreaks, the oo platform automates real-time contact tracing by recording all "transmission events" between phones, as well as observable changes in behavior that result. this automation yields critical data that are often missing from standard real-life outbreak datasets. the data are accessible via a web-based dashboard where users can visualize real-time information on simulated infection and transmission patterns or view raw data for analysis. weeks before sars-cov-2 was first identified in humans (andersen et al. 2020) , we ran several oo simulations that mimicked outbreaks of a very similar sars-like virus in which pre-symptomatic carriers caused a significant fraction of transmissions. other epidemiological parameters representing early sars and mers outbreaks (e.g., basic reproductive number, r0, of 2-3) were also programmed into the app. seeking to complicate traditional transmission dynamics (where participants know if they are sick), we built in asymptomatic transmission with high transmissibility to allow the virus to spread widely at the beginning of the simulations. these simulations took place in both school and conference settings, with hundreds of participants in close proximity. the app-generated data from these simulations represented the "ground truth" of the mock outbreaks, captured several essential features of sars-cov-2, and allowed us to observe behavioral changes among participants--many of which are now being mirrored in real life. in this article, we describe the predictive power of our oo simulations, share the resulting epidemiological data, and propose ways to use oo to bring students back to campus safely by j o u r n a l p r e -p r o o f teaching them the fundamentals of pandemic response--a critical effort in fighting the current pandemic and preparing for the next one. initial design and use of oo. initial design and use of oo. initial design and use of oo. initial design and use of oo. we created oo in collaboration with sarasota military academy (sma) preparatory school in 2015 as a two-week curriculum in pandemic preparedness, culminating in a class-wide outbreak simulation. we initially used stickers to "transmit disease." in late 2017, we introduced the oo app and platform, which triggers infection and recovery events using probabilities that can be flexibly configured, offering virtually limitless possibilities to simulate additional elements such as falsepositive cases, clinically-diverse strains and personal protective equipment (ppe). the oo platform includes three interconnected components ( figure 1 ): (1) the mobile app uses the proximity and location-sensing capabilities of smartphones to propagate the virtual pathogen. the app currently supports the use of bluetooth low energy (ble) beacons and qr codes, which can be used to represent zoonotic infectious sources, protective items (e.g., facemasks and hazmat suits), and other interventions that attenuate pathogen transmission (vaccines or therapies). (2) an administrator website enables organizers of simulations to set parameters for each simulation (e.g., number of participants, duration, symptoms, outcomes). (3) a graphical dashboard retrieves data from simulated outbreaks (e.g., number of cases, transmission events, participant health status), and allows for visualizations, calculations, and other activities that develop skills in data science. the dashboard data can also be extracted for more sophisticated computational analyses. our oo app-based simulations at sma over the last five years have involved more than 180 eighthgrade students who took on roles as general population, clinical workers, epidemiologists, and government officials. their goal: to "win the game" by preventing the virtual pathogen from infecting more than a predetermined threshold of players. oo allows organizers to parameterize different outbreak scenarios with known pathogens, de novo pathogens based on real microbes, or even fictional diseases. for our 2018 simulation at sma, we chose ebola as the pathogen and configured the symptoms and fatality rate accordingly. in 2019, given reported risks of emerging respiratory viruses (cui et al. 2019 ), we simulated a coronavirus modeling the sars r0 of 2-3 (lipsitch et al. 2003 ) and the clinical symptoms of mers (assiri et al. 2013 ). we added one more key parameter: a period of asymptomatic transmission (twice the duration of the symptomatic period), to allow the virus to spread widely at the beginning of the game. in early december 2019, we simulated outbreaks of the sars-cov-2-like virus at sma (185 participants) and the annual retreat of the broad institute of mit and harvard (100 participants). we also simulated this virus in february 2020 at the day-long florida undergraduate research conference (furc); 260 of the 590 attendees installed the app to run an unsupervised simulation for the full conference. realistic outbreak scenarios predict population behavior and increase engagement. realistic outbreak scenarios predict population behavior and increase engagement. realistic outbreak scenarios predict population behavior and increase engagement. realistic outbreak scenarios predict population behavior and increase engagement. the socio-behavioral parallels between our past simulations and the current pandemic are striking. notably, oo simulations have repeatedly foreshadowed the political distrust and altercations that have increased alongside covid-19 in the us. they have vividly illustrated that viral outbreaks reveal and exacerbate existing rifts in society (kim and bostwick 2020) . for example, in one simulation, students acting as "government officials" tried to spread disinformation to manipulate public behavior. this strategy backfired when students acting as "media" discovered the truth and informed the general population. "citizens" who had previously complied with "government" orders immediately broke quarantine, further driving viral transmission. the government's refusal to properly "fund" its epidemiology team also drew widespread criticism--portending similar arguments now being made about fiscal allocation at all levels of the us government. in another simulation, a member of the student "police" was approached by a classmate who refused to comply with orders to disclose his infection status (as indicated on the app). the officer "shot" the student (with a nerf gun) for non-compliance. similar real-life incidents have been reported in multiple countries (snyder et al. 2020; hayes and seucharan 2020) . we have also consistently observed that student "family units" with fewer in-game "credits" (simulated money) are more likely to be infected and die than their more privileged counterparts. this functional inequality is likely because the less fortunate "families" regularly spend their tokens on periodic "food distribution," leaving little left over to purchase "ppe." we have simulated many of the interventions currently being considered for covid-19, such as face masks, ppe, and even vaccines. in some cases, these interventions initially caused problems of their own. for instance, when masks were first introduced, a group of students "bought" them in bulk and tried to sell them at higher prices, only relenting under "public pressure" (precisely as was observed with the hoarding of medical-grade masks, toilet paper, and disinfectants at the start of the covid-19 pandemic). however, in the long run, all three measures reduced infection transmission in the simulation, especially when given to highly-vulnerable participants (e.g., "healthcare workers"). students themselves have proved to be an organic test of other proposed initiatives. in each simulation, without prompting, they implemented social distancing and a form of "remote work" (photographing and sharing educational material online to limit physical interaction). they also developed a way to assess players' health status and limit movements accordingly, paralleling the real-world use of health/immunity passports and containment strategies. however, as trust in the "government" eroded, some students tried to "game" the game by faking their health status screenshots. active learning exercises like oo have been repeatedly shown to improve stem learning outcomes (balicer 2007; freeman et al. 2014 ). our preliminary pedagogical data suggest this is true for oo. average test scores for the oo unit are higher than those for other units at sma, across all genders and ethnicities. for the past three years, post-simulation survey data have shown that oo is the most anticipated lesson by all classes in any subject. students have been especially eager to play the roles of epidemiologists and triage workers. in the last two years, 70 of 185 students signed up for this role; over half were female and 30% were underrepresented hispanic or black minorities. the role of simulation in exploring outbreak dynamics. the role of simulation in exploring outbreak dynamics. the role of simulation in exploring outbreak dynamics. the role of simulation in exploring outbreak dynamics. realistic simulated outbreaks provide a unique opportunity to capture not only behavioral changes in response to viral spread, but also the "ground truth" of transmission, i.e., documentation of every single event (fuller 2020) . the oo app produces real-time anonymous "contact tracing" data using bluetooth, recording who "infects" whom and when, and the subsequent series of events for each participant, ending in "recovery" or "death." this data reflects the spread of the virtual pathogen among the participants with a granularity that is nearly impossible to replicate in the real world--and it can be used like real outbreak data for epidemiological modeling and visualization. it also allows us to quantitatively explore the effects of changing parameters (e.g., r0) and the impact of containment and prevention measures (e.g., social distancing and vaccination). our 2018 sma ebola simulation first showed how student social-distancing could affect an "outbreak's" trajectory ( figure 2a more detailed data from the 2019 simulation allowed us to reconstruct transmission chains over time and identify important features of the outbreak, such as the existence of two super-spreaders causing 4 and 5 secondary infections early in the game ( figure 2c ). as with covid-19 (kupferschmidt 2020), these super-spreader events accounted for a significant fraction of cases--in the simulation, 30% of all secondary infections were caused by these two participants. the 2020 simulation at furc using sars-like parameters allowed us to explore the effect of herd immunity. only 40% of conference attendees installed the app, leaving susceptible players buffered from each other by non-participants--just as vaccinated or otherwise immune individuals buffer the more vulnerable from the transmission of real diseases. consistent with this observation, simulated transmission levels peaked throughout the day but never showed the exponential growth expected in an entirely susceptible population. the furc data were particularly revelatory when paired with the conference program. the effective reproductive number as a function of time, rt, remained below 2--again, consistent with a population with significant herd immunity--but spiked during activities that required attendees to be in close proximity to each other: two presentation sessions (posters and oral), a workshop session, and lunch ( figure 2d ). a roadmap for the near future: pandemic education, preparedness, and data gen a roadmap for the near future: pandemic education, preparedness, and data gen a roadmap for the near future: pandemic education, preparedness, and data gen a roadmap for the near future: pandemic education, preparedness, and data generation. eration. eration. eration. the covid-19 pandemic presents a unique opportunity to rethink the way we educate students and other stakeholders about outbreak response--and to do so in a way that can facilitate the students' return to the classroom. we envision oo as playing two key roles: (1) as a pedagogical platform for teaching fundamentals of pandemic response that are vital for the public to understand and (2) as a novel system for simulating outbreaks and evaluating real-world mitigation strategies, including those needed to restart in-person education. we have already begun to leverage oo to help mitigate the covid-19 pandemic. in summer 2020, we partnered with the one summer chicago program to train 2,000 students as social distancing ambassadors. as part of the training, the ambassadors integrated the app into their daily lives over a seven-day period. each of three regions of the city was randomly seeded with the same number of index cases, while the app tracked social contacts and transmission events. the simulation results and post-simulation survey demonstrated that the students retained the knowledge they learned and had a significantly increased interest in public health careers after the program. to further increase oo's realism, we are enhancing the platform with components specifically informed by and focused on sars-cov-2. these include: • a multi a multi a multi a multi----faceted "health score." faceted "health score." faceted "health score." faceted "health score." this feature aggregates physical movement (quantified by a step counter or changes in gps location), social interactions (quantified by bluetooth proximity measurements), and infectious disease knowledge (quantified by quizzing users about outbreak science). this score influences participants' risk and recovery probabilities based on behaviors and responses during the simulation--effectively gamifying oo and incentivizing behaviors and responses that are beneficial during real-life pandemics, especially those in which underlying health conditions play an important role in determining outcomes. • tools to evaluate response readiness. tools to evaluate response readiness. tools to evaluate response readiness. tools to evaluate response readiness. we are adding features that allow students and stakeholders to evaluate their mitigation strategies in real-time based on changing data. one new feature lets stakeholders choose which individuals can be "diagnosed" given available in-j o u r n a l p r e -p r o o f game funding, and assess resulting efforts to track and trace. we will also allow for simulated changes in pathogen genetics. this feature will generate more realistic data on pathogen transmission and evolution, and will support oo's use in more advanced classes (e.g., genetic epidemiology courses). • comprehensive educational curriculum on outbreak science. comprehensive educational curriculum on outbreak science. comprehensive educational curriculum on outbreak science. comprehensive educational curriculum on outbreak science. we are developing a robust, modular, scalable curriculum on outbreak science in the form of an online and print textbook, online lectures, learning assessments, and an online video series. we are currently working on two curricula: one for middle schools and another for high schools and colleges. we have already begun pilots at schools across the us. • remote learning capabilities, including add remote learning capabilities, including add remote learning capabilities, including add remote learning capabilities, including add----ons to existing multiplayer online games. ons to existing multiplayer online games. ons to existing multiplayer online games. ons to existing multiplayer online games. to account for the dramatically increased numbers of students now in remote learning--and to mimic disease transmission in close quarters--we have created options for people to play oo with family members at home. we also are working on an online multiplayer version of oo, inspired by the so-called "corrupted blood incident," a virtual--and unintended--pandemic in world of warcraft (wow) that occurred in 2005 due to an error in the game's code. epidemiologists later found many correlations between players' reactions to the virtual pandemic and documented historical responses to real outbreaks (balicer 2007) , including failed quarantine attempts and a high potential for rapid global spread. conclusions. conclusions. conclusions. unprecedented times yield unprecedented opportunities. the covid-19 pandemic has rendered the traditional in-person school experience impossible without mitigation strategies, and such measures, from masks to hybrid learning, may combine to make the coming school year 'less than' what would have been. yet the pandemic also presents a unique opportunity. we know that students engage most deeply with topics that affect them directly and daily--those they care about most. if we give students a new way to actively learn about epidemiology and public health through the lens of the pandemic, we can train them to play important roles in mitigating its spread and transitioning from lockdowns to reopening. we can also give them a 'more than' experience--one that ignites their interest in stem and other education and gives them agency to prevent future pandemics. declaration of interests declaration of interests declaration of interests declaration of interests p.c.s. is a co-founder and shareholder of sherlock biosciences, and is a non-executive board member and shareholder of danaher corporation the proximal origin of sars-cov-2 epidemiological, demographic, and clinical characteristics of 47 cases of middle east respiratory syndrome coronavirus disease from saudi arabia: a descriptive study modeling infectious diseases dissemination through online role-playing games origin and evolution of pathogenic coronaviruses finding a path to reopen schools during the covid-19 pandemic. the lancet. child & adolescent health active learning increases student performance in science, engineering, and mathematics what's missing in pandemic models ontario man dies in police shooting after mask dispute in grocery store social vulnerability and racial inequality in covid-19 deaths in health education & behavior : the official publication of the society for public health education statistical inference for partially observed markov processes via the r package pomp why do some covid-19 patients infect many others, whereas most don't spread the virus at all? transmission dynamics and control of severe acute respiratory syndrome three family members charged in shooting death of security guard who told a customer to put on a face mask should schools reopen? kids' role in pandemic still a mystery key: cord-272520-7mci4mip authors: goepfert, p. a.; wang, g.; mulligan, m. j. title: identification of an er retrieval signal in a retroviral glycoprotein date: 1995-08-25 journal: cell doi: 10.1016/0092-8674(95)90026-8 sha: doc_id: 272520 cord_uid: 7mci4mip nan soluble or membrane-spanning proteins that are resident in the endoplasmic reticulum (er) possess short amino acid sequence motifs that result in their localization in the er by retrieval, retention, or both. one such protein targeting signal, the carboxy-terminal tetrapeptide kdel (munro and pelham, 1987) , retrieves soluble proteins from the golgi complex to the er lumen by interacting with the kdel receptor (lewis and pelham, 1992) . a second consensus motif, consisting of two lysines located at positions -3 and either -4 or -5 from the cytoplasmic carboxyl terminus, was identified in 14 of 15 er resident type 1 membrane proteins (jackson et al., 1990) (table 1) . proteins with the dilysine signal are retrieved from post-er compartments to the er (jackson et al., 1993) by cytosolic coat proteins (cops) controlling retrograde vesicular transport . type 2 membrane proteins that reside in the er possess an analogous double-arginine er localization signal at their cytoplasmic amino termini (schutze et al., 1994) . enveloped viruses bud through the cellular membrane to which their glycoproteins sort (reviewed by stephens and compans, 1988) . some viral glycoproteins sort to the specialized apical or basolateral plasma membrane domains of polarized epithelial cells and direct virion budding across those membranes. for example, gp160 of human immunodeficiencyvirus type 1 (hiv-l) sorts to the basolateral domain (owens et al., 1991; lodge et al., 1994) , while hemagglutinin of influenzavirus sorts to the apical domain (roth et al., 1983) . to mediate budding of infectious viral particles through intracellular membranes, viral glycoproteins must fulfill two requirements: a signal for sorting to an intracellular compartment and an interaction with the core proteins of the virus. for example, membranespanning domains of the m protein of coronaviruses (machamer and rose, 1987; swift and machamer, 1991) or the gl protein of bunyaviruses (matsuoka et al., 1994) specify glycoprotein accumulation at the golgi complex. the adenovirus glycoprotein el9 possesses adilysine signal that localizes it to the er (jackson et al., 1990) where it binds class i molecules to diminish recognition of adeno-letter to the editor virus-infected cells by cytotoxic t lymphocytes (cox et al., 1991) . the foamy viruses are a genus of retroviruses that infect a wide variety of mammalian hosts; e.g., they are ubiquitous in nonhuman primates, bovines, and felines and occasionally infect humans and other mammals (hooks and gibbs, 1975) . however, foamy viruses have yet to be associated definitively with any disease process (weiss, 1988; mergia and luciw, 1991; neumann-haefelin et al., 1993) . similar to certain classic oncoviruses, the foamy viruses utilize a b/d-type virus assembly strategy by which immature viral capsids assemble in the cytoplasm in advance of virion budding (achong et al., 1971; hooks and gibbs, 1975) . as demonstrated by electron microscopy, the characteristic foamy cytopathic effect is due to syncytium formation and a proliferation of swollen, intracytoplasmic membrane-bound structures. furthermore, foamy viruses bud intracellularly into these cytoplasmic structures and also from the plasma membrane (hooks and gibbs, 1975) . what mechanisms cause the foamy viruses to bud at intracellular membranes, a site of maturation that is unusual among infectious retroviruses? we hypothesized that foamy virus glycoproteins must possess a specific signal for sorting to an intracellular compartment of the secretory pathway. we then searched foamy virus glycoprotein sequences for the presence of known intracellular compartment retention/retrieval motifs. viral sequences were obtained from the original publications or from the national center for biotechnology information database. remarkably, we identified the dilysine er retrieval signal at the cytoplasmic carboxyl termini of four of the five available foamy virus glycoprotein sequences ( table 2 ). the bovine syncytial virus (bsv) possessed a lysine in postion -3 and arginines in positions -4, -5, and -6, similar to the rabbit 53 kda sarcoplasmic reticulum protein (ser 53kd; see table 1 ). it is likely that the arginines in positions -4, -5, and -6 of bsv compensate for the lack of lysine in positions -4 or -5, as was suggested for ser 53kd (jackson et al., 1990) . the conservation of this wellcharacterized er localization signal within the glycoproteins of foamy viruses of human, chimpanzee, rhesus macaque, african green monkey, or bovine origin suggested that it fulfills a critical function in foamy virus biology. the dilysine motif was not present in the glycoproteins of lentiviruses, oncoviruses, or an intracisternal a particle (data not shown). this finding allowed the formulation of aworking mechathe carboxy-terminal amino acid sequences of el9 protein of adenovirus serotype 3 and three other er resident type 1 membrane proteins are shown. the lysines at position -3, -4, or -5 (shown in bold) have been experimentally determined to be essential for localization of these proteins to the er (jackson et al., 1990) . nistic model for foamy virus assembly. first, following translation, protein folding, and oligomerization, foamy virus glycoproteins follow the secretory pathway out of the er. second, in early post-er compartments, the glycoproteins meet one of two possible fates-budding or retrieval-mediated by their cytoplasmically exposed glycoprotein tails. if the glycoprotein cytoplasmic tails interact with preassembled viral capsids, which may localize to the same intracellular site, coordinated budding of glycoprotein-bearing infectious virus particles into an early post-er compartment results. if the glycoprotein dilysine signal interacts with the cytoplasmic coatomer complex, retrieval of the glycoprotein from the golgi to the er by retrograde vesicular transport results. viral glycoproteins reaching the golgi are thereby recycled and may complete a budding interaction on their next transit through the early post-er compartment. third, the prominent cytoplasmic vacuolization that is the hallmark of foamy virus infection is a cytopathic effect of this virus on the secretory pathway. fourth, since tissue cultures infected with foamy viruses form syncytia and since virus budding from the plasma membrane occurs in some cell types (gelderblom and frank, 1987) , foamy virus glycoproteins must also escape through the golgi stacks by the default secretory pathway. this implies either that the interactions of glycoprotein cytoplasmic tails with viral capsids or cops are saturable or that competitive interactions with other proteins (or viral cytopathic effects) limit the efficiency of budding or retrieval and create the third possible fate for the glycoproteins-escape to the plasma membrane. the post-er intermediate compartment protein ergic-53, which also possesses a dilysine signal, was reported to saturate its pre-golgi retention machinery when overexpressed, leading to cell surface expression of ergic-53 (kappeler et al., 1994) . teleologically, foamy viruses appear to have evolved a novel assembly strategy to ensure survival. the role of the er retrieval signal in the biology of these superior symbioticviruses can be established with biochemical and genetic experimental approaches. furthermore, the foamy viruses may be useful probes of the newly appreciated role of coatomer (reviewed by pelham, 1994) in selective retrograde transport of membrane proteins frpm the golgi complex to the er. renshaw and casey, 1994 the carboxy-terminal cytoplasmic domains of glycoproteins of the five foamy viruses for which glycoprotein sequences are available. the lysines at positions -3 and either -4 or -5 (shown in bold) represent the dilysine motif. animal virus structure inter-virology key: cord-291790-z5rwznmv authors: li, qianqian; wu, jiajing; nie, jianhui; zhang, li; hao, huan; liu, shuo; zhao, chenyan; zhang, qi; liu, huan; nie, lingling; qin, haiyang; wang, meng; lu, qiong; li, xiaoyu; sun, qiyu; liu, junkai; zhang, linqi; li, xuguang; huang, weijin; wang, youchun title: the impact of mutations in sars-cov-2 spike on viral infectivity and antigenicity date: 2020-07-17 journal: cell doi: 10.1016/j.cell.2020.07.012 sha: doc_id: 291790 cord_uid: z5rwznmv summary the spike protein of sars-cov-2 has been undergoing mutations and is highly glycosylated. it is critically important to investigate the biological significance of these mutations. here we investigated 80 variants and 26 glycosylation site modifications for the infectivity and reactivity to a panel of neutralizing antibodies and sera from convalescent patients. d614g, along with several variants containing both d614g and another amino acid change, were significantly more infectious. most variants with amino acid change at receptor binding domain were less infectious but variants including a475v, l452r, v483a and f490l became resistant to some neutralizing antibodies. moreover, the majority of glycosylation deletions were less infectious whilst deletion of both n331 and n343 glycosylation drastically reduced infectivity, revealing the importance of glycosylation for viral infectivity. interestingly, n234q was markedly resistant to neutralizing antibodies, whereas n165q became more sensitive. these findings could be of value in the development of vaccine and therapeutic antibodies. the spike protein of sars-cov-2 has been undergoing mutations and is highly glycosylated. it is 25 critically important to investigate the biological significance of these mutations. here we 26 investigated 80 variants and 26 glycosylation site modifications for the infectivity and reactivity to 27 a panel of neutralizing antibodies and sera from convalescent patients. d614g, along with several 28 variants containing both d614g and another amino acid change, were significantly more 29 infectious. most variants with amino acid change at receptor binding domain were less infectious 30 but variants including a475v, l452r, v483a and f490l became resistant to some neutralizing 31 antibodies. moreover, the majority of glycosylation deletions were less infectious whilst deletion 32 of both n331 and n343 glycosylation drastically reduced infectivity, revealing the importance of 33 glycosylation for viral infectivity. interestingly, n234q was markedly resistant to neutralizing 34 antibodies, whereas n165q became more sensitive. these findings could be of value in the 35 development of vaccine and therapeutic antibodies. 36 covid-19 pandemic is a tremendous threat globally. as of july 3, 2020, 216 countries have 38 reported covid-19 cases, with more than 10 million confirmed cases and approximately 518,000 39 deaths (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports/). 40 the causative agent of covid-19, sars-cov-2 causes a lower respiratory tract infection that 41 can progress to severe acute respiratory syndrome and even multiple organ failure (lv et al., 42 2020a; yang et al., 2020) . 43 sars-cov-2 is a single-stranded positive-strand rna virus whose genome encodes four 44 structural proteins: spike (s), small protein (e), matrix (m) and nucleocapsid (n) (chan et al., 45 2020) . the s protein is a type i fusion protein that forms trimers on the surface of the virion. it is 46 composed of two subunits, with s1 responsible for receptor binding and s2 for membrane fusion 47 7 single mutants were also constructed to compare with the double mutants with d614g. group c is 112 comprised of 26 mutants at the putative glycosylation sites (22 sites). this group includes both 113 variants (n74k, n149h and t719a) and investigational mutants that we made for the analyses of 114 the effects of glycosylation. specifically, all 22 sites (n to q) were made in the lab to generate 22 115 individual mutants; we also made a combination by deleting the two glycosylation sites in rbd. 116 in total, we have generated 106 pseudotyped viruses, i.e., 80 variants and 26 glycosylation 117 mutants ( figure 1 ). these viruses were prepared as described previously (nie et al., 2020 ) (see 118 star methods). 119 to determine the infectivity of these variants and mutants, we first infected 26 cell lines with 121 pseudotyped viruses with either sars-cov-2 s protein or vsvg protein (see star methods). 122 as expected, the two types of pseudotyped viruses are different in the infection efficiency in the 123 26 cell lines ( figure 2) . while almost all cell lines were generally susceptible to infection by vsv 124 g pseudotyped virus, sars-cov-2 pseudotyped virus could efficiently infect certain cell lines 125 including three human cell lines (293t-hace2, 293t and huh-7) and three non-human primate 126 cell lines (vero, veroe6 and llc-mk2). as such, we selected these four out of the six cell lines 127 in subsequent experiments, including 293t-hace2, huh-7, vero and llc-mk2. 128 we first tested the infectivity of 106 pseudotyped viruses (80 natural variants and 26 129 glycosylation mutants) in 293t-hace2 cells, where a difference by 4 -fold in rlu compared 130 with the reference wuhan-1 strain (genbank: mn908947) was deemed as being significant 131 ( figure s1 ). of all 106 pseudotyped viruses, 22 were determined as low-infectivity (16 natural 132 mutants and 6 glycosylation mutants), with rlu reading decreased by 4 to 100 folds ( figure 3a) . 133 among them, 13 were located in the rbd region. variant v341i and investigational glycosylation 134 mutant (n331q +n343q) were deemed as no-infectivity as demonstrated by over 100-fold 135 8 decrease in rlu values compared with the reference strain. both of them were located in rbd. it 136 is worth noting that double glycosylation deletions at n331 and n343 resulted in a drastic 137 reduction in viral infectivity (1200-fold), whereas single deletion at each site caused modest 138 reduction in viral infectivity, with the infectivity of n331q reduced by only 3-fold and n343q by 139 20-fold. moreover, the non-natural double glycosylation mutations in rbd (n331q and n343q) 140 resulted in significantly reduced infectivity, suggesting that the two glycosylation sites in the rbd 141 region may participate in the binding of the receptor or maintain the conformation of the rbd 142 region. 143 the remaining 63 variants were tested further with other three cell lines for infectivity 144 suggesting that the enhanced infectivity was more likely ascribed to d614g itself. 149 antibodies 151 having identified the variants with altered infectivity, we next set out to investigate the 152 antigenicity of the infectious mutants using 13 neutralizing monoclonal antibodies (mabs) (see 153 star methods). it was noted that some changes in rbd region demonstrated altered sensitivity 154 to neutralizing mabs (figure 4 and figure s2 ). specifically, a475v reduced the sensitivity to 155 mabs 157, 247, cb6, p2c-1f11, b38 and ca1, while f490l reduced the sensitivity to mabs 156 x593, 261-262, h4 and p2b-2f6. moreover, v483a became resistant to mabs x593 and 157 p2b-2f6, and l452r to mabs x593 and p2b-2f6. finally, y508h reduced the sensitivity to 158 mabs h014, n439k to mab h00s022, a831v to mab b38, d614g+i472v to mab x593 and 159 d614g+a435s to mab h014 by more than 4 times. in addition, some changes in the rbd region, 9 including v367f, q409e, q414e, i468f, i468t, y508h and a522v, were observed to be more 161 susceptible to neutralization mediated by mabs. 162 we next determine how infectious glycosylation mutants reacted to the same panel of mabs. 163 mutant n165q actually became more sensitive to mab p2b-2f6, whereas n234q reduced the 164 neutralization sensitivity to different set of mabs including 157, 247, cb6, p2c-1f11, h00s022, 165 b38, ab35 and h014. these results confirmed that these two glycosylation sites are important for 166 receptor binding. 167 these mabs have proven to be valuable in our analyses of the amino acid changes. as shown 168 in figure 4 , five mabs, i.e., 157, 247, cb6, p2c-1f11 and b38, were unable to effectively 169 neutralize both a475v and n234q. neither x593 nor p2b-2f6 was effective in neutralizing 170 l452r, v483a and f490l whilst p2b-2f6 was more effective in neutralizing n165q. in addition, 171 mab h014 was incapable of neutralizing n234q, y508h and d614g+a435s while mabs h4 172 and 261-262 were found not to neutralize f490l. furthermore, finally, h00s022 was unable to 173 neutralizing n439k and n234q. 174 finally, we determined the sensitivity of the strains with amino acid changes to ten 176 covid-19 convalescent sera (see star methods). none of the variants and mutants 177 demonstrated significantly altered sensitivity to all 10 convalescent sera, i.e., the ec50 values 178 were not altered by more than 4-fold, irrespective of an increase or decrease, when compared with 179 the reference strain ( figure 5a and figure s3 ). however, the neutralization sensitivity of both 180 f490l and h519p to three of ten patient sera were found to have decreased by more than 4 times, 181 while six variants and mutants (n149h, n149q, n165q, n354d, n709q and n1173q) became 182 over 4-fold sensitive to one or two of the ten tested sera. notably, five out of the six were glycan 183 deletion mutants. 184 as shown in figure 5b , when the data of individual convalescent sera were pooled together 185 to analyze the sensitivity of all variants, no marked difference was observed (>4 fold). however, 186 modest differences between some variants and reference strain (within 4-fold) were observed in 187 their reactivity to grouped convalescent sera. these differences were statistically significant 188 (p<0.05). it is worth mentioning that some variants including f338l, v367f, i468f, i468t and 189 v615l ( figure 5b ) were even more sensitive to the convalescent sera compared with reference 190 strain, whereas more variants were found to be resistant to the convalescent sera. these variants 191 include single amino acid change such as y145del, q414e, n439k, g446v, k458n, i472v, 192 a475v, t478i, v483i, f490l and a831v, as well as the double amino acid changes including 193 d614g + q321l, d614g +i472v, d614g +a831v, d614g +a879s and d614g +m1237i. 194 similar to natural variants, although the magnitude of some glycosylation deletions in 195 sensitivity to the sera is less than 4-fold, the differences between mutants and the reference strain 196 (wuhan-1) were found to be still several-folds and statistically significant, i.e., glycosylation 197 mutants n331q and n709q significantly increased the sensitivity to convalescent sera ( and ambiguous sequences, we narrowed down to 80 variants. moreover, as glycosylation of viral 206 protein is well documented to affect viral replication and immune response and sars-cov-2 s 207 protein is heavily glycosylated, we also made 26 substitutional mutations at all 22 putative 208 glycosylation sites. in total, we made 106 pseudotyped viruses, allowing us to characterize them 209 using the established method (nie et al., 2020) (see star methods). 210 table 1 summarize the characteristics of variants and investigational mutants. of all variants, 211 d614g is of particular note. this variant has been shown to rapidly accumulating since its 212 emergence and linked to more clinical presentations (korber et al., 2020) . at the beginning of this 213 study (may 6, 2020), it accounted for 62.8% of all circulating strains, but by july 3, it had reached 214 75.7%. this dominant strain could effectively infect the four cell lines tested, being 10-fold more 215 infectious than the original wuhan-1 strain ( figure 3) . 216 another important finding is that natural variants capable of affecting the reactivity to 217 neutralizing mabs were almost all located in the rbd region (except a831v and 218 d614g+a831v), as all antibodies used in this study were targeting the rbd ( with decreased sensitivity to neutralization by p2b-2f6 mab; as both l452r and f490l remain 225 sensitive to p2c-1f11, suggesting this mab is not derived from the same clone for p2b-2f6. 226 moreover, both mutants displayed decreased sensitivity to another neutralizing mab x593 by 227 10-fold compared with the reference strain (figure4). 228 while we identified multiple variants with decreased sensitivity to neutralizing mabs, we 229 need to look at how frequent these variants are in the field. v483a in rbd is one of the two 230 variants with a mutation frequency of over 0.1%. it showed decreased reactivity to the two mabs 231 (p2b-2f6 and x593) ( figure 6a and 6b) (ju et al., 2020) . another rbd variant a475v sits in the 232 binding epitope of rbd. it is significantly resistant to several neutralizing mabs including 233 p2c-1f11, ca1, 247 and cb6. it is noteworthy that cb6 mab targets the receptor binding 12 epitope ( figure 6c and 6d) (shi et al., 2020) . specifically, y508 was buried in the epitope 235 targeted by mab h014 (figure 6e and 6f) (lv et al., 2020b) . indeed, the y508h was found to be 236 resistant to this mab. 237 it is worth mentioning that d614g+i472v has shown increased infectivity and more 238 resistance to neutralizing antibodies (table 1 ), but only one sequence (originated from canada) 239 was reported in gisaid. moreover, some variants, including n439k, l452r, a475v, v483a, 240 f490l and y508h, do have decreased sensitivity to neutralizing mabs. however, only v483a 241 exceeded 0.1% in frequency at the beginning of the study, all of which were found in us, with 28 242 sequences reported as of may 6, 2020, and 36 up to july 3, 2020. variants containing n439k 243 showed a significant increase in circulation, i.e., with 5 case reported as of may 6, 2020 (all in uk) 244 to 47 by july 3, 2020 (45 in uk, 2 in romania). in addition, only one sequence from france 245 containing y508h was deposited in girsaid as of may 6, while four sequences reported as of 246 july 3, 2020, of which two originated from netherlands, one from sweden, and one from france. 247 only one or two isolates were reported for other variants, which have not been observed to have 248 increased during the time frame we have been monitoring. nevertheless, as rna viruses mutate 249 all the time and some variants may only appears during certain period of time, while others could 250 emerge in an unpredictable fashion, continued analyses of the circulating strains in terms of the 251 mutation frequency and temporal pattern are warranted. 252 our results suggest that the 13 mabs used in this study could be divided into seven groups as 253 they appear to be different in the inhibitory effects on the variants. as such, it would be interesting 254 to formulate a therapeutic regimen comprised of at least two mabs. for example, a combination 255 of p2c-1f11 and x593 should be effective to inhibit all variants in this study. it would be of 256 interest to test more neutralizing antibodies which could be targeting epitopes outside rbd. 257 with regard to the glycosylation mutants analyzed in this study, n165q increased the 258 sensitivity to mab p2b-2f6 whilst n234q displayed resistance to neutralizing mabs such ca1, 259 cb6, 157 and others. although neither of them is found in circulation, the reactivity of these two 260 13 mutants to neutralizing mab is still worth noting. as n165 and n234 are located near the rbd 261 region (watanabe et al., 2020) , these mutants may affect some epitopes targeted by neutralizing 262 mabs. specifically, n165 glycosylation site is involved in the binding of mab to the rbd region 263 of s protein (cao et al., 2020) . it is likely that the sugar chain can mask the epitope targeted by the 264 antibody. this type of glycan shield has been observed in other virus such as hiv-1. specifically the use of sera from 10 convalescent patients in neutralizing assay largely confirmed the 277 results obtained with the well characterized neutralizing mabs. it is understood that the magnitude 278 of altered reactivity is slightly smaller with human sera than that with mabs, given that polyclonal 279 antibodies from convalescent patents are directed against multi-epitopes on the full-length s 280 protein; as a result, these polyclonal antibodies could complement one another. however, the 281 differences in their reactivity to the human antibodies were found to be by several folds in most 282 cases and all determined as statistically significant. notably, some rbd variants such as a475v 283 and f490l have been confirmed to have decreased sensitivity to both human sera and multiple 284 neutralizing mabs. a475v reduced the sensitivity to 6 mabs out of the 13 mab used in this study, 285 while f490l reduced the sensitivity to neutralization by 3 mabs. it is possible that antibodies in 286 14 convalescent sera are able to neutralize these critical epitopes targeted by these mabs that are 287 known to disrupt the binding of the s protein to hace2 receptor (ju et serial dilutions of mab preparations were pre-incubated with the pseudotyped viruses at 37â°c for 355 one hour before they were added to huh-7 cells. luciferase activity was measured 24 hours later 356 to calculate ec50 of each antibody. the ratio of ec50 between the variant or mutant strains and 357 the reference strain (wuhan-1) was calculated and analyzed to generate heatmap using hem i 17 (deng et al., 2014) . the data were the results from 3-5 replicates. the red and blue boxes indicate 359 the increase or decrease of the neutralization activity as shown in the scale bar. see also figure s2 . serial dilutions of mab preparations were pre-incubated with the virus at 37â°c for one hour 393 before they were added to huh-7 cells. luciferase activity was measured 24 hours later to 394 calculate ec50 of each antibody. the y-axis represents the ratio of ec50 between the 395 variant/mutant strain and the reference strain (wuhan-1). the data were the results from 3-5 396 replicates. the horizontal dashed lines indicate the threshold of 4-fold difference. the significant 397 changes were marked with colored symbols, blue for decreased, red for increased. related to 398 further information and requests for resources and reagents should be directed to and will be 409 fulfilled by the lead contact, dr. youchun wang (wangyc@nifdc.org.cn). 410 all the unique reagents generated in this study are available from the lead contact with a 412 completed materials transfer agreement. 413 this study did not generate any unique datasets or code. primers. following site-directed mutagenesis pcr, the template chain was digested using dpni 452 restriction endonuclease (neb, usa). afterwards, the pcr product was directly used to 453 transform e. coli dh5î± competent cells; single clones were selected and then sequenced. the 454 primers designed for the specific mutation sites are listed in table s2 , and the frequency of 455 different variants in the epidemic population is listed in table s1 . 456 highlights over 100 mutations were selected for analyses on their infectivity and antigenicity the dominant d614g itself and combined with other mutations are more infectious ablation of both n331 and n343 glycosylation at rbd drastically reduced infectivity ten mutations such as n234q, l452r, a475v, v483a was markedly resistant to some mabs eighty natural variants and twenty-six glycosylation spike mutants of sars-cov-2 were analyzed in terms of infectivity and antigenicity using high throughput pseudovirus assay in conjunction with neutralizing antibodies. reference l5f l8v l8w h49y y145del f338l p384l n354d n354k s359n v367f k378r p384l r408i q409e q414e a435s n439k g446v l452r k458r k458n i468f i468t i472v a475v g476s t478i v483a v483i f490l y508h h519p h519q a520s a522s a522v v615l a831v d839e d936y s943t s943r g1124v y145del+r408i d614g+q239k d614g+q321l d614g+v341i d614g+a435s d614g+k458r d614g+i472v d614g+h519p d614g+a831v d614g+a845s d614g+a879s d614g+d936y d614g+s939f d614g+s943t d614g+m1229i d614g d614g+m1237i d614g+p1263l d614g+l5f n17q n61q n74q n149q n165q n234q n282q n331q n603q n616q n657q n709q n1098q n1134q n1158q n1173q n1194q n74k n149h a b reference cs1 cs2 cs10 cs3 cs86 cs7 cs4 cs87 cs6 cs8 l5f l8v l8w h49y y145del f338l a348t n354d n354k s359n v367f k378r p384l r408i q409e q414e a435s n439k g446v l452r k458r k458n i468f i468t i472v a475v g476s t478i v483a v483i f490l y508h h519p h519q a520s a522s a522v d614g v615l a831v d839e d936y s943t s943r g1124v y145del+r408i d614g+l5f d614g+q239k d614g+q321l d614g+v341i d614g+a435s d614g+k458r d614g+i472v d614g+h519p d614g+a831v d614g+a845s d614g+a879s d614g+d936y d614g+s939f d614g+s943t d614g+m1229i d614g+m1237i d614g+p1263l n17q n61q n74q n149q reference l5f l8v l8w h49y y145del f338l a348t n354d n354k s359n v367f k378r p384l r408i q409e q414e a435s n439k g446v l452r k458r k458n i468f i468t i472v a475v g476s t478i v483a v483i f490l y508h h519p h519q a520s a522s a522v d614g v615l a831v d839e d936y s943t s943r g1124v y145del+r408i d614g+l5f d614g+q239k d614g+q321l d614g+v341i d614g+a435s d614g+k458r d614g+i472v d614g+h519p d614g+a831v d614g+a845s d614g+a879s d614g+d936y d614g+s939f d614g+s943t d614g+m1229i d614g+m1237i d614g+p1263l n17q n61q n74q n149q n165q n234q n282q n331q n603q n616q n657q n709q n1098q n1134q n1158q n1173q n1194q n74k sars-cov-2 viral spike g614 mutation exhibits 526 higher case fatality rate potent neutralizing antibodies against sars-cov-2 identified by high-throughput 529 single-cell sequencing of convalescent patients' b cells genomic 532 characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with 533 atypical pneumonia after visiting wuhan mutated covid-19, may foretells mankind in a great risk in the future hemi: a toolkit for illustrating 537 heatmaps ebola virus glycoprotein with increased 540 infectivity dominated the 2013-2016 epidemic the hiv glycan shield as a target for broadly neutralizing antibodies the spike protein of 544 sars-cov--a target for vaccine and therapeutic development why are rna virus mutation rates so damn high? the highly conserved glycan at asparagine 260 of hiv-1 547 gp120 is indispensable for viral entry identification of 549 immunodominant sites on the spike protein of severe acute respiratory syndrome (sars) 550 coronavirus: implication for developing sars diagnostics and vaccines airborne transmission of influenza 554 a/h5n1 virus between ferrets n-linked glycans 556 and k147 residue on hemagglutinin synergize to elicit broadly reactive h1n1 influenza virus coronavirus spike protein and tropism 559 changes human neutralizing antibodies elicited by sars-cov-2 infection crystal structure of a fully glycosylated hiv-1 564 gp120 core reveals a stabilizing role for the glycan at asn262 tracking changes in sars-cov-2 spike: 567 evidence that d614g increases infectivity of the covid-19 virus structural, 570 glycosylation and antigenic variation between 2019 novel coronavirus (2019-ncov) and sars 571 coronavirus quasispecies theory and the behavior of rna viruses functional assessment of cell entry and receptor 575 usage for sars-cov-2 and other lineage b betacoronaviruses structure, function, and evolution of coronavirus spike proteins removal of a single n-linked glycan in human 580 immunodeficiency virus type 1 gp120 results in an enhanced ability to induce neutralizing 581 antibody responses coronavirus disease (covid-19): a scoping review structural basis for neutralization of sars-cov-2 and sars-cov by a potent 587 therapeutic antibody establishment and validation of a pseudovirus neutralization assay for antigenic 592 drift of influenza a(h7n9) virus hemagglutinin influenza a(h7n9) virus evolution: which genetic 594 mutations are antigenically important? a virus that has 596 gone viral: amino acid mutation in s protein of indian isolate of coronavirus covid-19 might 597 impact receptor binding, and thus, infectivity emerging genetic diversity among clinical isolates of 601 sars-cov-2: lessons for today a human neutralizing antibody targets the receptor binding site of sars-cov-2 a single mutation in 606 chikungunya virus affects vector specificity and epidemic potential human adaptation of ebola 609 virus during the west african outbreak two n-linked glycosylation sites in the v2 and c2 612 regions of human immunodeficiency virus type 1 crf01_ae envelope glycoprotein gp120 613 regulate viral neutralization susceptibility to the human monoclonal antibody specific for the cd4 614 binding domain emergence of genomic diversity and recurrent 617 mutations in sars-cov-2 emerging wuhan (covid-19) coronavirus: glycan shield 619 and structure prediction of spike glycoprotein and its interaction with human cd26 function, and antigenicity of the sars-cov-2 spike glycoprotein structural and functional basis of sars-cov-2 entry by using human ace2 a systematic study of the n-glycosylation sites of hiv-1 envelope protein on infectivity and 629 antibody-mediated neutralization n463 glycosylation site on v5 loop of a mutant gp120 regulates the sensitivity of 632 hiv-1 to neutralizing monoclonal antibodies vrc01/03 site-specific glycan 635 analysis of the sars-cov-2 spike cryo-em structure of the 2019-ncov spike in the prefusion conformation a noncompeting pair of human neutralizing antibodies block covid-19 virus binding to 641 its receptor ace2 clinical course and outcomes of critically ill patients with sars-cov-2 pneumonia in 644 china: a single-centered, retrospective, observational study characterization of a filovirus (mengla virus) from rousettus bats in 648 china role of stem 650 glycans attached to haemagglutinin in the biological characteristics of h5n1 avian influenza virus pseudotyped viruses incorporated with spike protein from either sars-cov-2, variants or 458 mutants were constructed using a procedure described by us recently (nie et al., 2020) . on day 459 before transfection, 293t cells were prepared and adjusted to the concentration of 5 -7 ã� 10 5 460 cell/ml, 15 ml of which were transferred into a t75 cell culture flask and incubated overnight at 461 37 0 c in an incubator conditioned with 5% co 2 . the cells generally reach 70-90% confluence after 462 overnight incubation. thirty microgram of dna plasmid expressing the spike protein was 463 transfected according to the user's instruction manual. the transfected cells were subsequently 464 infected with g*â��g-vsv (vsv g pseudotyped virus) at concentration of 7.0 ã� 10 4 tcid50/ml. 465these cells were incubated at 37â°c for 6-8 hours in the presence of in 5% co 2 . afterwards, cell 466 supernatant was discarded, followed by rinsing the cells gently with pbs +1% fbs. next, 15ml 467 fresh complete dmem was added to the flask and cultured for 24 h. twenty-four hours post 468 infection, sars-cov-2 pseudotyped viruses containing culture supernatants were harvested, 469 filtered (0.45-âµm pore size, millipore, cat#slhp033rb) and stored at â��70â°c in 2-ml aliquots 470 until use. 471the 50% tissue culture infectious dose (tcid50) of sars-cov-2 pseudovirus was 472 determined using a single-use aliquot from the pseudovirus bank to avoid inconsistencies resulted 473 from repeated freezing-thawing cycles. 474for titration of the pseudotyped virus, a 2-fold initial dilution with six replicates was made in 475 96-well culture plates followed by serial 3-fold dilutions. the last column was employed as the 476 cells control without pseudotyped virus. subsequently, the 96-well plates were seeded with huh-7 477 cells adjusted to 2ã�10 5 cells/ml. after 24 h incubation at 37â°c in a humidified atmosphere with 5% 478 co 2 , the supernatant was aspirated and discarded gently to leave 100 âµl in each well; next, 100 âµl 479 of luciferase substrate (perkinelmer, cat#6066769) was added to each well. after 2-min 480 incubation at room temperature in the dark, 150 âµl of lysate was transferred to white 96-well 481 plates for the detection of luminescence using a luminometer (perkinelmer, ensight). positive was 482 23 determined to be ten-fold higher than the negative (cells only) in terms of relative luminescence 483 unit (rlu) values. the 50% tissue culture infectious dose (tcid50) was calculated using the 484 reed-muench method (nie et al., 2020) . 485 before quantification, all the pseudotyped viruses were purified through a 25% sucrose cushion by 487 ultra-centrifugation at 100,000ã� g for 3 h (nie et al., 2020) resources table. 495 using the quantitative rt-pcr, we normalized the pseudotyped virus particles to the same 497 amount. after normalization, 100 âµl of the pseudotyped virus with 10-fold dilution was added to 498 wells in 96-well cell culture plate. after the cells were trypsin-digested, 2ã�10 4 /100 âµl cells were 499 added to each well in the 96-well plates. the plates were then incubated at 37â°c in a humidified 500 atmosphere with 5% co 2 . after incubation for 24 hours, chemiluminescence detection was 501 performed as described in the titration of pseudotyped viruses. each group contained 3-5 502replicates. 503 the virus neutralization assay was conducted as described previously (nie et al., 2020) . briefly, 505 100 âµl serial dilutions of human sera or monoclonal antibody preparations were added into 506 96-well plates. after that, 50 âµl pseudoviruses with concentration of 1300 tcid50/ml were added 507 into the plates, followed by incubation at 37â°c for 1 hour. afterwards, huh-7 cells were added 508 24 into the plates (2ã�10 4 cells/100 âµl cells per well), followed by incubation at 37â°c in a humidified 509 atmosphere with 5% co 2 . chemiluminescence detection was performed after 24 hours incubation. 510the reed-muench method was used to calculate the virus neutralization titer. the results are 511 based on 3-5 replicates unless specified. in order to validate the test operation process, the 512 coefficient of variance (cv) control of replicates is set within 30% of six wells, so is the cv for 513 the duplicate sample wells. 514 graphpad prism 8 was used for plotting and statistical analysis; the values were expressed as 517 mean â±sem. one-way anova and holm-sidak's multiple comparisons test was used to analyze 518 the differences between groups. a p-value of less than 0.05 was considered to be significant. * 519 p<0.05, ** p<0.01, *** p<0.005, **** p<0.001, ns represents no significant difference. 520 key: cord-295894-us5x1jtg authors: li, bing; hao, jiaqing; zeng, jun; sauter, edward r. title: snapshot: fabp functions date: 2020-08-20 journal: cell doi: 10.1016/j.cell.2020.07.027 sha: doc_id: 295894 cord_uid: us5x1jtg fatty acid binding proteins (fabps) serve as intracellular chaperones for fatty acids and other hydrophobic ligands inside cells. recent studies have demonstrated new functions of individual members of the fabp family. this snapshot describes the overall functions of fabps in health and disease and highlights emerging roles of adipose fabp (a-fabp) and epidermal fabp (e-fabp) in the fields of obesity, chronic inflammation, and cancer development. to view this snapshot, open or download the pdf. abbreviations: fabps, fatty acid binding proteins; nfκb: nuclear factor kappa b; cox2, cyclooxygenase-2; cer, ceramide; lxr, liver x receptor; pge2, prostaglandin e2; scd1, stearoyl-coa desaturase 1; stat, signal transducer and activator of transcription; lta4, leukotriene a4; rar, retinoic acid receptor; nlrp3, nucleotide-binding domain leucine-rich repeat and pyrin domain containing 3; asc, apoptosis-associated speck-like protein containing a caspase-recruitment domain; ifnβ, interferon β; ltb4, leukotriene b4; hsl, hormone sensitive lipase; ppar, peroxisome proliferator-activated receptors; dio2, deiodinase type 2; β-ar, β-adrenergic receptor; aldh1, aldehyde dehydrogenase isoform 1; pck1, phosphoenolpyruvate carboxykinase 1. pi3k, phosphatidylinositol-3-kinases; mapk, mitogen-activated protein kinase; erk, extracellular regulated kinase as evolutionarily conservative proteins, fatty acid binding proteins (fabps) play a central role in coordinating lipid transport, metabolism, and responses in various tissues and organs across species (storch and corsico, 2008) . fabps were named according to the tissue where they were originally identified. for example, the fabp predominantly expressed in the liver was named l-fabp (also known as fabp1), whereas the fabp mainly found in the heart was named h-fabp (also known as fabp3). the fabp family is composed of at least nine homologous proteins with similar tertiary structures and specific tissue distribution patterns (left side). all fabp members are able to bind hydrophobic lipid ligands in the cavity of the β barrel structure, which is made of 10 anti-parallel β stands and capped by a helix-turn-helix motif. because of differences in their amino-acid sequences, fabp family members possess different lipid ligand-binding specificity and affinity. moreover, individual fabp members exhibit unique functionality that reflects the unique environments of the tissues and organs where they are expressed. generally, fabps function as cytoplasmic lipid chaperones to (1) facilitate fatty acid solubilization, trafficking, and metabolism; (2) interact with various membrane and intracellular proteins (e.g., peroxisome proliferator-activated receptors [ppars] , hormone sensitive lipase [hsl]); and (3) regulate tissue and cellular specific lipid responses (middle wheel panel). in doing so, fabps carry out pleiotropic functions to maintain tissue homeostasis in health and to participate in disease pathogenesis (left side). with the prevalence of obesity, adipose-fabp (a-fabp) and epidermal-fabp (e-fabp) have become the two most studied fabp family members because of their remarkable functions in obesity-associated diseases in both animal and human studies (hotamisligil and bernlohr, 2015) . obesity is associated with expanded adipose tissue composed of inflammatory adipocytes and macrophages, both of which express a-fabp and e-fabp. adipocytes predominantly express a-fabp, but e-fabp can be compensatorily upregulated during a-fabp deficiency, indicating a functional overlap between a-fabp and e-fabp in adipocytes. studies using a-fabp and e-fabp double-knockout mice demonstrate that a-fabp and e-fabp are essential in high-fat-diet (hfd)-induced obesity, insulin resistance, and type 2 diabetes, as well as in modulating systemic lipid and glucose metabolism. interestingly, a-fabp and e-fabp are also expressed in macrophages, but neither compensates for the other in a-fabp-or e-fabp-deficient mice, suggesting unique functions of the two fabps in macrophages. emerging studies demonstrate that a-fabp has a distinct expression profile than e-fabp among different macrophage subsets; thus, a-fabp and e-fabp exhibit unique functionality in different macrophages (hao et al., 2018a; zhang et al., 2014) . these findings not only explain the uncompensated regulation of a-fabp and e-fabp in macrophages but suggest them as new markers in defining the functional heterogeneity of macrophage subsets. indeed, a-fabp and e-fabp regulate different signaling pathways in macrophages. although e-fabp expression promotes the activation of stat1/2/ifnβ, lta4/ ltb4, rar/cd11c, or nlrp3/il-1β pathways (top left of the right panel), a-fabp expression mainly activates nfκb/il-6, cox2/pge2, cer/cell death, or lxr/scd1 in macrophages (top right of the right panel). for example, in hfd-induced obese mouse models, expression of e-fabp, but not a-fabp, in skin macrophages is essential to the induction of interleukin 1 (il-1)β-mediated skin inflammation (zhang et al., 2015) . by contrast, a-fabp deficiency protects mice against atherosclerosis development, mainly because of its ability to reduce lipid-induced endoplasmic reticulum stress in macrophages (erbay et al., 2009) . moreover, a-fabp is highly expressed in alveolar macrophages in covid-19 patients, which could contribute to obesity-associated severity of covid-19 (liao et al., 2020; richardson et al., 2020) . in tumors, a-fabp expression in macrophages promotes tumor growth and metastasis through inducing tumor-promoting il-6 signaling, whereas e-fabp expression in macrophages enhances type i interferon β (ifnβ) responses to inhibit tumor progression (zhang et al., 2014; hao et al., 2018a) . thus, a-fabp and e-fabp regulate different inflammatory and metabolic pathways, representing functional markers that demonstrate heterogeneous features of tissue macrophages. it is worth noting that fabps are traditionally considered as cytoplasmic proteins that coordinate lipid responses inside cells. for instance, intracellular a-fabp in adipocytes accounts for up to 5% of total cytosolic proteins and is critical in the maintenance of dynamic lipid balance by regulating hsl-mediated lipolysis and pparγmediated lipogenesis. recent studies demonstrate that external factors (e.g., hfd, β-ar signaling) are able to induce a-fabp secretion from adipocytes. circulating a-fabp functions as a new adipokine linking obesity-associated diseases, such as enhancing obesity-associated breast cancer development and liver glucose production in diabetes (hao et al., 2018b; cao et al., 2013) (bottom right of the right panel). in addition, mutated tumor cells (e.g., some breast or ovarian cancer cells) ectopically upregulate the expression of a-fabp and/or e-fabp, which in turn promote tumor cell proliferation and metastasis by activating different oncogenic signaling pathways (middle left of the right panel). in summary, accumulating evidence has demonstrated that fabp members not only exert overlapping functions in lipid binding and transport but exhibit unique characteristics in specific cells and tissues as well. further understanding as to how different fabps are specifically regulated in different cells and tissues (e.g., immune cell subsets), as well as the mechanisms regulating cell metabolism and function, will provide insights into the actions of fabps and facilitate their clinical applications in obesity-associated diseases. adipocyte lipid chaperone ap2 is a secreted adipokine regulating hepatic glucose production reducing endoplasmic reticulum stress through a macrophage lipid chaperone alleviates atherosclerosis expression of adipocyte/macrophage fatty acid-binding protein in tumor-associated macrophages promotes breast cancer progression circulating adipose fatty acid binding protein is a new link underlying obesity-associated breast/mammary tumor development metabolic functions of fabps--mechanisms and therapeutic implications single-cell landscape of bronchoalveolar immune cells in patients with covid-19 presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with covid-19 in the we thank dr. nicole neuman for the constructive advice during preparation of the paper. this work was supported by nih grants r01ca177679, r01ca180986, and r01ai137324. key: cord-312115-foy3dsq4 authors: sekine, takuya; perez-potti, andré; rivera-ballesteros, olga; strålin, kristoffer; gorin, jean-baptiste; olsson, annika; llewellyn-lacey, sian; kamal, habiba; bogdanovic, gordana; muschiol, sandra; wullimann, david j.; kammann, tobias; emgård, johanna; parrot, tiphaine; folkesson, elin; rooyackers, olav; eriksson, lars i.; henter, jan-inge; sönnerborg, anders; allander, tobias; albert, jan; nielsen, morten; klingström, jonas; gredmark-russ, sara; björkström, niklas k.; sandberg, johan k.; price, david a.; ljunggren, hans-gustaf; aleman, soo; buggert, marcus title: robust t cell immunity in convalescent individuals with asymptomatic or mild covid-19 date: 2020-08-14 journal: cell doi: 10.1016/j.cell.2020.08.017 sha: doc_id: 312115 cord_uid: foy3dsq4 summary sars-cov-2-specific memory t cells will likely prove critical for long-term immune protection against covid-19. we here systematically mapped the functional and phenotypic landscape of sars-cov-2-specific t cell responses in unexposed individuals, exposed family members, and individuals with acute or convalescent covid-19. acute phase sars-cov-2-specific t cells displayed a highly activated cytotoxic phenotype that correlated with various clinical markers of disease severity, whereas convalescent phase sars-cov-2-specific t cells were polyfunctional and displayed a stem-like memory phenotype. importantly, sars-cov-2-specific t cells were detectable in antibody-seronegative exposed family members and convalescent individuals with a history of asymptomatic and mild covid-19. our collective dataset shows that sars-cov-2 elicits robust, broad and highly functional memory t cell responses, suggesting that natural exposure or infection may prevent recurrent episodes of severe covid-19. the world changed in december 2019 with the emergence of a new zoonotic pathogen, severe acute respiratory syndrome coronavirus 2 (sars-cov-2), which causes a variety of clinical syndromes collectively termed coronavirus disease 2019 . at present, there is no vaccine against sars-cov-2, and the excessive inflammation associated with severe covid-19 can lead to respiratory failure, septic shock, and ultimately, death (guan et al., 2020; wolfel et al., 2020; wu and mcgoogan, 2020) . the overall mortality rate is 0.5-3.5% (guan et al., 2020; wolfel et al., 2020; wu and mcgoogan, 2020) . however, most people seem to be affected less severely and either remain asymptomatic or develop only mild symptoms during covid-19 (he et al., 2020b; wei et al., 2020; yang et al., 2020) . it will therefore be critical in light of the ongoing pandemic to determine if people with milder forms of covid-19 develop robust immunity against sars-cov-2. global efforts are currently underway to map the determinants of immune protection against sars-cov-2. recent data have shown that sars-cov-2 infection generates near-complete protection against rechallenge in rhesus macaques (chandrashekar et al., 2020) , and similarly, there is limited evidence of reinfection in humans with previously documented covid-19 (kirkcaldy et al., 2020) . further work is therefore required to define the mechanisms that underlie these observations and evaluate the durability of protective immune responses elicited by primary infection with sars-cov-2. most correlative studies of immune protection against sars-cov-2 have focused on the induction of neutralizing antibodies (hotez et al., 2020; robbiani et al., 2020; seydoux et al., 2020; wang et al., 2020) . however, antibody responses are not detectable in all patients, especially those with less severe forms of covid-19 (long et al., 2020; mallapaty, 2020; woloshin et al., 2020) . previous work has also shown that memory b cell responses tend to be short-lived after infection with sars-cov-1 (channappanavar et al., 2014; tang et al., 2011) . in contrast, memory t cell responses can persist for many years (le bert et al., 2020; tang et al., 2011; yang et al., 2006) and, in mice, protect against lethal challenge with sars-cov-1 (channappanavar et al., 2014) . ni et al., 2020) . it has nonetheless remained unclear to what extent various features of the t cell immune response associate with serostatus and the clinical course of covid-19. to address this knowledge gap, we characterized sars-cov-2-specific cd4 + and cd8 + t cells in outcome-defined cohorts of donors (total n = 206) from j o u r n a l p r e -p r o o f sweden, which has experienced a more open spread of covid-19 than many other countries in europe (habib, 2020) . our preliminary analyses showed that the absolute numbers and relative frequencies of cd4 + and cd8 + t cells were unphysiologically low in patients with acute moderate or severe covid-19 ( figure 1a and figure s1a , b). this finding has been reported previously (he et al., 2020a; liu et al., 2020) . we then used a 29-color flow cytometry panel to assess the phenotypic landscape of these immune perturbations in direct comparisons with healthy blood donors and individuals who had recovered from mild covid-19 acquired early during the pandemic (february to march 2020). the patient demographics are described in star methods section. none of the parameters were found to be prognostic for the outcome of the disease severity. the to extend these findings, we concatenated all memory cd4 + t cells and all memory cd8 + t cells from healthy blood donors, convalescent individuals, and patients with acute moderate or severe covid-19. phenotypically related cells were identified using the clustering algorithm phenograph, and marker expression patterns were visualized using the dimensionality reduction algorithm uniform manifold approximation and projection (umap). distinct topographical clusters were apparent in each group ( figure 1d and figure s2a , b). in particular, memory cd8 + t cells from patients with acute moderate or severe covid-19 expressed a distinct cluster of markers associated with activation and the cell cycle, including cd38, hla-dr, ki-j o u r n a l p r e -p r o o f 67, and pd-1 ( figure 1d and figure s2a ). a similar pattern was observed among memory cd4 + t cells from patients with acute moderate or severe covid-19 ( figure s2b ). these findings were confirmed via manual gating of the flow cytometry data ( figure 1e ). correlative analyses further demonstrated that the activated/cycling phenotype was strongly associated with the presence of sars-cov-2 igg levels, as well as various clinical parameters, including age, hemoglobin concentration, platelet count, and plasma levels of alanine aminotransferase, albumin, d-dimer, fibrinogen, and myoglobin ( figure s2c , d), but less strongly associated with plasma levels of various inflammatory markers, including interleukin (il)-1β, il-10, and tumor necrosis factor (tnf) (table s1 ). collectively, these data suggest that the combination of activation markers on t cells potentially marks a more robust early sars-cov-2specific adaptive immune response in covid-19. unphysiologically high expression frequencies of cd38, potentially driven by a highly inflammatory environment, were consistently observed among memory cd8 + t cells from patients with acute moderate or severe covid-19 ( figure s3a , b). in line with these data, we found that cd8 + t cells specific for cytomegalovirus (cmv) or epstein-barr virus (ebv) more commonly expressed cd38, but not hla-dr, ki-67, or pd-1, in patients with acute moderate or severe covid-19 compared with convalescent individuals and healthy blood donors, indicating limited bystander activation and proliferation during the early phase of infection with sars-cov-2 ( figure 2a , b and figure s3c ). of note, actively proliferating cd8 + t cells, defined by the expression of ki-67, exhibited a predominant ccr7 − cd27 + cd28 + cd45ra − cd127 − phenotype in patients with acute moderate or severe covid-19 ( figure s3d ), as reported previously in the context of vaccination and other viral infections (buggert et al., 2018; miller et al., 2008) . on the basis of these findings, we used overlapping peptides spanning the immunogenic domains of the sars-cov-2 spike, membrane, and nucleocapsid proteins to stimulate peripheral blood mononuclear cells (pbmcs) from patients with acute moderate or severe covid-19. a vast majority of responding cd4 + and cd8 + t cells displayed an activated/cycling (cd38 + hla-dr + ki67 + pd-1 + ) phenotype ( figure 2c ). these results were confirmed using an activation-induced marker (aim) assay to measure the upregulation of cd69 and 4-1bb (cd137), suggesting that most cd38 + pd-1 + cd8 + t cells were specific for sars-cov-2 ( figure 2d ). in further experiments, we used hla class i tetramers as probes to detect cd8 + t cells specific for predicted optimal epitopes from sars-cov-2 ( figure s3e ) (table s2) . a vast majority of tetramer + cd8 + t cells in the acute phase of infection, but not during convalescence, displayed an activated/cycling phenotype ( figure 2e ). in general, early sars-cov-2-specific cd8 + t cell populations were characterized by the expression of immune activation molecules (cd38, hla-dr, ki-67), inhibitory receptors (pd-1, tim-3), and cytotoxic molecules (granzyme b, perforin), whereas convalescent phase sars-cov-2-specific cd8 + t cell populations were skewed toward an early differentiated memory (ccr7 + cd127 + cd45ra −/+ tcf1 + ) phenotype ( figure 2f ). importantly, the expression frequencies of ccr7 and cd45ra among sars-cov-2-specific cd8 + t cells were positively correlated with the number of symptom-free days after infection (ccr7: r = 0.79, p = 0.001; cd45ra: r = 0.70, p = 0.008), whereas the expression frequency of granzyme b among sars-cov-2-specific cd8 + t cells was inversely correlated with the number of symptom-free days after infection (r = 0.70, p = 0.007) ( figure 2g ). time from exposure was therefore associated with the emergence of stem-like memory sars-cov-2-specific cd8 + t cells. on the basis of these observations, we quantified functional sars-cov-2-specific memory t cell responses across five distinct cohorts, including healthy individuals who donated blood either before or during the pandemic, family members who shared a household with convalescent individuals and were exposed at the time of symptomatic disease, and individuals in the convalescent phase after mild or severe covid-19. we detected potentially cross-reactive t cell responses directed against either the spike or membrane proteins in a total of 28% of the healthy individuals who donated blood before the pandemic, consistent with previous reports le bert et al., 2020) , but nucleocapsid reactivity was notably absent in this cohort ( figure 3a and figure s4a , b). the highest response frequencies against any of the three proteins were observed in convalescent individuals who experienced severe covid-19 (100%). progressively lower response frequencies were observed in convalescent individuals with a history of mild covid-19 (87%), exposed family members (67%), and healthy individuals who donated blood during the pandemic (46%) ( figure 3a ). to assess the functional capabilities of sars-cov-2-specific memory cd4 + and cd8 + t cells in convalescent individuals, we stimulated pbmcs with the overlapping spike, membrane, and nucleocapsid peptide sets and measured a surrogate marker of degranulation (cd107a) along with the production of interferon (ifn)-γ, il-2, and tnf ( figure 3b , c). sars-cov-2-specific cd4 + t cells predominantly expressed ifn-γ, il-2, and tnf ( figure 3b ), whereas sars-cov-2-specific cd8 + t cells predominantly expressed ifn-γ and mobilized cd107a ( figure 3c ). we then used the aim assay to determine the functional polarization of sars-cov-2-specific cd4 + t cells. interestingly, spike-specific cd4 + t cells were skewed toward a circulating t follicular helper (ctfh) profile, suggesting a key role in the generation of potent antibody responses, whereas membrane-specific and nucleocapsid-specific cd4 + t cells were skewed toward a th1 or a th1/th17 profile ( figure 3d and figure s5a , b). in a final series of experiments, we assessed the recall capabilities of sars-cov-2specific cd4 + and cd8 + t cells in convalescent individuals, exposed family members, and healthy blood donors. proliferative responses were identified by tracking the progressive dilution of a cytoplasmic dye (celltrace violet; ctv) after stimulation with the overlapping spike, membrane, and nucleocapsid peptide sets, and functional responses to the same antigens were evaluated 5 days later by measuring the production of ifn-γ (blom et al., 2013; buggert et al., 2014) . anamnestic responses in the cd4 + and cd8 + t cell compartments, quantified as a function of ctv low ifn-γ + events ( figure 4a ), were detected in most convalescent individuals (mc = 96%, sc = 100%) and exposed family members (92%) ( figure 4b , c). sars-cov-2-specific cd4 + t cell responses were proportionately larger overall than the corresponding sars-cov-2-specific cd8 + t cell responses (ef = 1.8-fold, mc = 1.4-fold, sc = 1.8-fold larger accordingly) ( figure 4d ). in addition, most ifn-γ + sars-cov-2-specific cd4 + t cells produced tnf, and most ifn-γ + sars-cov-2specific cd8 + t cells produced granzyme b and perforin ( figure 4e ). serological evaluations revealed a strong positive correlation between igg responses directed against the spike protein of sars-cov-2 and igg responses directed the nucleocapsid protein of sars-cov-2 (r = 0.82, p < 0.001) ( figure s5c ). moreover, sars-cov-2-specific cd4 + and cd8 + t cell responses were present in j o u r n a l p r e -p r o o f seronegative individuals, albeit at lower frequencies compared with seropositive individuals (41% versus 99%, respectively) ( figure 4f ). these discordant responses were nonetheless pronounced in some convalescent individuals with a history of mild covid-19 (3/31), exposed family members (9/28), and healthy individuals who donated blood during the pandemic (5/31) ( figure 4f and figure s5d ), often targeting both the internal (nucleocapsid) and surface antigens (spike and/or membrane) of sars-cov-2 ( figure 4g ). higher frequencies of t cell responses were also found in exposed seronegative family members compared to unexposed donors ( figure s5e ). potent memory t cell responses were therefore elicited in the absence or presence of circulating antibodies, consistent with a non-redundant role as key determinants of immune protection against covid-19 (chandrashekar et al., 2020) . j o u r n a l p r e -p r o o f we are currently facing the biggest global health emergency in decades, namely the devastating outbreak of covid-19. in the absence of a protective vaccine, it will be critical to determine if exposed and/or infected people, especially those with asymptomatic or very mild forms of the disease who likely act inadvertently as the major transmitters, develop robust adaptive immunity against sars-cov-2 (long et al., 2020) . in this study, we used a systematic approach to map cellular and humoral immune responses against sars-cov-2 in patients with acute moderate or severe covid-19, individuals in the convalescent phase after mild or severe covid-19, exposed family members, and healthy individuals who donated blood before (2019) or during the pandemic (2020). individuals in the convalescent phase after mild covid-19 were traced after returning to sweden from endemic areas (mostly northern italy). these donors exhibited robust memory t cell responses months after infection, even in the absence of detectable circulating antibodies specific for sars-cov-2, indicating a previously unanticipated degree of population-level immunity against covid-19. we found that t cell activation, characterized by the expression of cd38, was a hallmark of acute covid-19. similar findings have been reported previously in the absence of specificity data (huang et al., 2020; thevarajan et al., 2020; wilk et al., 2020) . many of these t cells also expressed hla-dr, ki-67, and pd-1, indicating a combined activation/cycling phenotype correlates with an early strong immune response, including an early sars-cov-2-specific igg response, and to a lesser extent with plasma levels of various inflammatory markers. our data also showed that many activated/cycling t cells in the acute phase were functionally replete and specific for sars-cov-2. equivalent functional profiles have been observed early after immunization with successful vaccines (blom et al., 2013; miller et al., 2008; precopio et al., 2007) . accordingly, the expression of multiple inhibitory receptors, including pd-1, likely indicates early activation rather than exhaustion (zheng et al., 2020a; zheng et al., 2020b) . virus-specific memory t cells have been shown to persist for many years after infection with sars-cov-1 (le bert et al., 2020; tang et al., 2011; yang et al., 2006) . in line with these observations, we found that sars-cov-2-specific t cells acquired an early differentiated memory (ccr7 + cd127 + cd45ra −/+ tcf1 + ) j o u r n a l p r e -p r o o f phenotype in the convalescent phase, as reported previously in the context of other viral infections and successful vaccines (blom et al., 2013; demkowicz et al., 1996; fuertes marraco et al., 2015; precopio et al., 2007) . this phenotype has been associated with stem-like properties (betts et al., 2006; blom et al., 2013; demkowicz et al., 1996; fuertes marraco et al., 2015; precopio et al., 2007) . accordingly, we found that sars-cov-2-specific t cells generated anamnestic responses to cognate antigens in the convalescent phase, characterized by extensive proliferation and polyfunctionality. of particular note, we detected similar memory t cell responses directed against the internal (nucleocapsid) and surface proteins (membrane and/or spike) in some individuals lacking detectable circulating antibodies specific for sars-cov-2. indeed, about twice as many healthy individuals who donated blood during the pandemic generated memory t cell responses in the absence of detectable circulating antibody responses, implying that seroprevalence as an indicator may underestimate the extent of population-level immunity against sars-cov-2. our study was cross-sectional in nature and limited in terms of clinical follow-up and overall donor numbers in each outcome-defined group. it therefore remains to be determined if robust memory t cell responses in the absence of detectable circulating antibodies can protect against severe forms of covid-19. this scenario has nonetheless been inferred from previous studies of mers and sars-cov-1 (channappanavar et al., 2014; li et al., 2008; zhao et al., 2017; zhao et al., 2016) , both of which have been shown to induce potent memory t cell responses that persist while antibody responses wane (alshukairi et al., 2016; shin et al., 2019; tang et al., 2011) . notably, waning antibodies, as distinguished in sars-cov-2 infection (ibarrondo et al., 2020; long et al., 2020) , is a natural phenomenon following coronavirus infections (callow et al., 1990) . the fact that memory b cells (juno et al., 2020) and robust t cell memory is formed after sars-cov-2 infection, suggests that potent adaptive immunity is maintained to provide protection against severe re-infection. in line with these observations, none of the convalescent individuals in this study, including those with previous mild disease, have experienced further episodes of covid-19. of note, we detected cross-reactive t cell responses against spike or membrane in 28% of the unexposed healthy blood donors, consistent with a high degree of preexisting immune responses potentially induced by other coronaviruses (braun et al., 2020; grifoni et al., 2020; le bert et al., 2020) . data on the cross-reactive responses where based on cryopreserved samples, which could have a negative impact on the j o u r n a l p r e -p r o o f frequency of t cell responders in sars-cov-2 unexposed donors (owen et al., 2007) . although we detected generally broader and stronger t cell responses in seronegative convalescent and exposed individuals compared to unexposed donors, it remains possible that a fraction of the anamnestic sars-cov-2-specific t cell response was initially induced by seasonal coronaviruses . the biological relevance of cross-reactive t cell responses remains unclear. however, it is tempting to speculate that such responses may provide at least partial protection against sars-cov-2, and different disease severity, given that pre-existing t cell immunity has been associated with beneficial outcomes after challenge with the pandemic influenza virus strain h1n1 (sridhar et al., 2013; wilkinson et al., 2012) . collectively, our data provide a functional and phenotypic map of sars-cov-2specific t cell immunity across the full spectrum of exposure, infection, and disease. the observation that many individuals with asymptomatic or mild covid-19, after sars-cov-2 exposure or infection, generated highly durable and functionally replete memory t cell responses, not uncommonly in the absence of detectable humoral responses, further suggests that natural exposure or infection could prevent recurrent episodes of severe covid-19. the authors declare that they have no competing financial interests, patents, patent applications, or material transfer agreements associated with this study. further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, marcus buggert (marcus.buggert@ki.se) aliquots of synthesized tetramers and monomers utilized in this study will be made available upon request. there are restrictions to the availability of the monomers due to cost and limited quantity. the published article includes all data generated during this study. all codes are freely available at source. maximal disease severity was assessed using the nih ordinal scale and sequential organ failure assessment (sofa) (beigel et al., 2020; singer et al., 2016) . the nih ordinal scale was defined as follows: (1) donors were assigned to one of seven groups for the purposes of this study. as: patients with acute severe disease requiring hospitalization in the high dependency or intensive care unit, with low-flow oxygen support (>10 l/min), high-flow oxygen support, or invasive mechanical ventilation (n = 17). these patients had a median nih ordinal scale score of 7 (iqr 6-7) and a median sofa score of 6 (iqr 3-6) at the time of sampling 12-17 days after disease onset (47% were viremic, and 82% were antibody-seropositive for sars-cov-2). am: patients with acute moderate disease requiring hospitalization and low-flow oxygen support (0-3 l/min; n = 10). these patients had a median nih ordinal scale score of 5 (iqr 5-5) and a median sofa score of 1 (iqr 1-1) at the time of sampling 11-14 days after disease onset (40% were viremic, and 50% were antibody-seropositive for sars-cov-2). sc: individuals in the convalescent phase after severe disease (n = 26). samples were collected 42-58 days after disease onset, corresponding to 3-21 days after resolution of symptoms (100% were antibody-seropositive for sars-cov-2). mc: individuals in the convalescent phase after mild disease (n = 40). samples were collected 49-64 days after disease onset, corresponding to 25-53 days after resolution of symptoms (85% were antibody-seropositive for sars-cov-2). exp: family members who shared a household with donors in groups mc or sc (n = 30). these individuals were exposed at the time of symptomatic disease (21% remained asymptomatic, and 64% were antibody-seropositive for sars-cov-2). 2020 bd: table s3 , and immunological assay breakdowns are summarized in table s4 . pbmcs were isolated from venous blood samples via standard density gradient peptides peptides corresponding to known optimal epitopes derived from cmv (pp65) and ebv (bzlf1 and ebna-1), overlapping peptides spanning the immunogenic domains of the sars-cov-2 spike (prot_s), membrane (prot_m), and nucleocapsid proteins (prot_n), and optimal peptides for the manufacture of hla class i tetramers were synthesized at >95% purity. lyophilized peptides were reconstituted at a stock concentration of 10 mg/ml in dmso and further diluted to 100 µg/ml in pbs. peptides were selected from full-length sars-cov-2 sequences spanning 82 different strains from 13 countries (national center for biotechnology information). the predicted binding affinities of conserved 9mer peptides for hla-a*0201 and hla-b*0702 were determined using netmhcpan version 4. 1 (reynisson et al., 2020) . binders were defined by a threshold ic 50 value of 500 nm. strong binders were defined by a % rank <0.5, and weak binders were defined by a % rank <2 (table s2) . a total of 13 strong binders were identified for tetramer generation (table s2 ). hla class i tetramers were generated as described previously (price et al., 2005) . pbmcs were labeled with ctv (0.5 µm; thermo fisher scientific), resuspended in complete medium at 1 x 10 7 cells/ml, and cultured at 1 x 10 6 cells/well in 96-well ubottom plates (corning) with the relevant peptides (each at 1 µg/ml) in the presence of anti-cd28/cd49d (3 µl/ml; clone l293/l25; bd biosciences) and il-2 (10 iu/ml; peprotech). negative control wells lacked peptides, and positive control wells included seb (0.5 µg/ml; sigma-aldrich) or plate-bound anti-cd3 (1 µg/ml; clone okt3; biolegend). functional assays were performed as described above after incubation for 5 days at 37°c. pbmcs were resuspended in complete medium at 1 x 10 7 cells/ml and cultured at 1 j o u r n a l p r e -p r o o f trucount absolute counts were obtained using multitest 6-color tbnk reagent with trucount tubes (bd biosciences). samples were fixed with 2% pfa for 2 hr prior to acquisition. absolute cd3 + cell counts were calculated using the following formula: (# cd3 + events acquired x total # beads x 1000) / (# beads acquired x volume of whole blood stained in µl). cd4 + and cd8 + cell counts were computed from the respective frequencies relative to cd3 + cells. analyses were performed using scikit-learn version 0.22.1 in python. data were normalized using sklearn.preprocessing.standardscaler in the same package to generate z-scores for pca. pbmcs were rested overnight in complete medium and seeded at 2 x 10 5 cells/well j o u r n a l p r e -p r o o f antibody response and disease severity in healthcare worker mers survivors. emerg infect dis hiv nonprogressors preferentially maintain highly functional hiv-specific cd8+ t cells temporal dynamics of the primary human t cell response to yellow fever virus 17d as it matures from an effector-to a memory-type response identification and characterization of hiv-specific resident memory cd8(+) t functional avidity and il-2/perforin production is linked to the emergence of mutations within hla-b*5701-restricted epitopes and hiv-1 disease progression the time course of the immune response to experimental coronavirus infection of man sars-cov-2 infection protects against rechallenge in rhesus macaques. science virus-specific memory cd8 t cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection human cytotoxic t-cell memory: long-lived responses to vaccinia virus long-lasting stem cell-like memory cd8+ t cells with a naive-like profile upon yellow fever vaccination targets of t cell responses to sars-cov-2 coronavirus in humans with covid-19 disease and unexposed individuals clinical characteristics of coronavirus disease 2019 in china has sweden's controversial covid-19 strategy been successful? the clinical course and its correlated immune status in covid-19 pneumonia covid-19 vaccines: neutralizing antibodies and the alum advantage clinical features of patients infected with 2019 novel coronavirus in wuhan rapid decay of anti-sars-cov-2 antibodies in persons with mild covid-19 humoral and circulating follicular helper t cell responses in recovered patients with covid-19 covid-19 and postinfection immunity: limited evidence, many remaining questions sars-cov-2-specific t cell immunity in cases of covid-19 and sars, and uninfected controls t cell responses to whole sars coronavirus in humans longitudinal characteristics of lymphocyte responses and cytokine profiles in the peripheral blood of sars-cov-2 infected patients will antibody tests for the coronavirus really change everything? selective and cross-reactive sars-cov-2 t cell epitopes in unexposed humans human effector and memory cd8+ t cell responses to smallpox and yellow fever vaccines detection of sars-cov-2-specific humoral and cellular immunity in covid-19 convalescent individuals loss of t cell responses following long-term cryopreservation diagnostic performances and thresholds: the key to harmonization in serological sars-cov-2 assays? immunization with vaccinia virus induces polyfunctional and phenotypically distinctive cd8(+) t cell responses convergent antibody responses to sars-cov-2 infection in convalescent individuals immune responses to middle east respiratory syndrome coronavirus during the acute and convalescent phases of human infection cellular immune correlates of protection against symptomatic pandemic influenza lack of peripheral memory b cell responses in recovered patients with severe acute respiratory syndrome: a six-year follow-up study breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid-19 a human monoclonal antibody blocking sars-cov-2 infection presymptomatic transmission of sars-cov-2 -singapore a single-cell atlas of the peripheral immune response in patients with severe covid-19 preexisting influenza-specific cd4+ t cells correlate with disease protection against influenza challenge in humans virological assessment of hospitalized patients with covid-2019 false negative tests for sars-cov-2 infection -challenges and implications characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72314 cases from the chinese center for disease control and prevention long-lived effector/central memory t-cell responses to severe acute respiratory syndrome coronavirus (sars-cov) s antigen in recovered sars patients comparison of clinical characteristics of patients with asymptomatic vs symptomatic coronavirus disease recovery from the middle east respiratory syndrome is associated with antibody and t-cell responses airway memory cd4(+) t cells mediate protective immunity against emerging respiratory coronaviruses elevated exhaustion levels and reduced functional diversity of t cells in peripheral blood may predict severe progression in covid-19 patients functional exhaustion of antiviral lymphocytes in covid-19 patients acute phase sars-cov-2-specific t cells display an activated cytotoxic phenotype 2. broad and polyfunctional sars-cov-2-specific t cell responses in convalescent phase 3. detection of sars-cov-2-specific t cell responses also in seronegative individuals etoc blurb buggert and colleagues provide a phenotypic and functional map of sars-cov-2-specific t cells across the full spectrum of exposure, infection, and covid-19 severity. they observe that sars-cov-2-specific t cells generate a broad, robust and functionally replete response in convalescent individuals we express our gratitude to all donors, health care personnel, study coordinators, administrators, and laboratory managers involved in this work. key: cord-307021-uppekume authors: crooke, elliott; guthrie, brenda; lecker, stewart; lill, roland; wickner, william title: proompa is stabilized for membrane translocation by either purified e. coli trigger factor or canine signal recognition particle date: 1988-09-23 journal: cell doi: 10.1016/0092-8674(88)90115-8 sha: doc_id: 307021 cord_uid: uppekume abstract we have isolated large amounts of e. coli outer-membrane protein a precursor (proompa). purified proompa is active in membrane assembly, and this assembly is saturable with respect to the precursor protein. a proompa-sepharose matrix allows affinity isolation of trigger factor, a soluble, 63,000 dalton monomeric protein that stabilizes proompa in assembly competent form. comparison of trigger factor's amino-terminal sequence with those in a computer data bank and with those encoded by sec genes, as well as groel and heat shock gene dnak, suggests that trigger factor is encoded by a previously undescribed gene. trigger factor and proompa form a 1:1 complex that can be isolated by gel filtration. purified canine signal recognition particle (srp) can also stabilize proompa for membrane insertion. this post-ribosomal activity of srp suggests a unifying theme in protein translocation mechanisms. the salient features of protein translocation across membranes have been described during the last decade (reviewed in wickner and lodish, 1985) . these include the presence of a leader sequence, a requirement for metabolic energy, and conformational change of the precursor proteins. protein insertion into the mammalian endoplasmic reticulum has been viewed as a strictly cotranslational event (blobel, 1980) coupled to translation by the action of signal recognition particle (srp; walter and blobel, 1981) . although mammalian and bacterial presecretory proteins share a common leader peptide structure, bacterial secretion is not coupled to translation, and many proteins are exported largely posttranslationally. although bacterial membrane assembly has been reconstituted with purified components for one protein, the ml3 procoat protein (ohno-lwashita and wickner, 1983) this protein is unusually simple in that its membrane insertion does not require the functioning of secand prl-encoded proteins (wolfe et al., 1985) . more recently, an in vitro translation-translocation reaction has been described for set and p&dependent proteins (miiller and blobel, 1984; rhoads et al., 1984) . in this reaction, presecretory proteins, synthesized in an escherichia coli extract, translocate into the lumen of inverted, sealed plasma membrane vesicles. as clearly established in vivo, the in vitro translocation is not coupled to ongoing polypeptide chain growth and requires both atp hydrolysis (chen and tai, 1985) and the membrane electrochemical potential (geller et al., 1988) . we have undertaken an enzymological approach to resolve and reconstitute the components of this translocation reaction. [ss] proompa was purified lsoo-fold in 8 m urea from an in vitro protein synthesis reaction (crooke and wickner, 1987) . upon dilution, the proompa renatures into a form competent for membrane assembly (crooke et al., 1988) . however, it rapidly loses this competence unless a soluble protein, which we have termed trigger factor, is present. trigger factor stabilizes proompa in a form that is competent for membrane assembly. indirect evidence has been presented which indicates that trigger factor and proompa may form a complex (crooke and wickner, 1987) . we now report the isolation of homogeneous proompa and trigger factor. an important step in the purification of trigger factor is affinity chromatography on proompa-sepharose, confirming our hypothesis that these proteins can form a complex. n-terminal sequence analysis indicates that trigger factor is not one of the proteins encoded by the well-studied set, prl, or heat shock genes. studies reported here and in an accompanying article (lill et al., 1988) show interactions of trigger factor with ribosomes, proompa, and membranes and suggest a cyclic mode of its action. the similarity of this trigger factor cycle to the cyclic action of canine srp led us to the finding that srp, which had been thought to act exclusively in the context of the ribosome, can also stabilize purified proompa for membrane insertion. these studies suggest that a primary function of trigger factor, and possibly srp, is to stabilize presecretory proteins for membrane translocation. of proompa a first step toward the isolation of large amounts of proompa was its overproduction in vivo. expression of the omp9 gene under the trc promoter in a multicopy plasmid causes substantial overproduction of proompa and ompa ( figure 1, lanes 1 and 2) . cells were disrupted and incubated with sarkosyl to solubilize the membrane proteins. as described by freud1 et al. (1986) the proompa remains aggregated in this detergent extract and can be separated from the other cellular proteins simply by centrifugation. proompa was solubilized from this "washed pellet" fraction with 8 m urea. this procedure resulted in a 5-fold purification and 35% yield of proompa. in a typical preparation, 422 mg of purified proompa (figure 1 , lanes 3-9) was obtained from 62 g (wet weight) of cells. the purified proompa can be renatured for membrane assembly by dilution of the urea. proompa in 8 m urea was diluted into a reaction mixture that contained atp and inverted, sealed e. coli plasma membrane vesicles strain w3110 bearing the trc-omp9 plasmid was grown at 37x in mq minimal medium with 0.5% glucose, 0.4% casamino acids (difco), and 100 us/ml ampicillin. at an am of 1.0, isopropyl thiogalactoside was added to 05 mm and growth was continued for 2 hr. cells were collected bycentrifugation, suspended in buffer (50 mm his-hci [ph 7.51, 10% sucrose), frozen as small nuggets by pipetting the suspension into liquid nitrogen, and stored at -8ooc. to prepare proompa, frozen cell suspensions (128 g) were thawed, mixed with lysozyme (0.9 mglml), and incubated for 5 min at 23oc. after addition of mgciz (5 mm) and dnaase i (4 kg/ml), the suspension was incubated an additional 5 min at 23oc. extraction buffer (360 ml; 1.5% iwt/vol] sarkosyl, 50 mm citrate, titrated to ph 6.0 with solid nazhpo.,) was added and the mixture subjected to dounce homogenization. after 30 min at 23oc), the solution was centrifuged (27,000 x g, 15 min at 23oc). this extraction was repeated twice. the final pellet was suspended in 22 ml of buffer c (see experimental procedures) by continuous vortexing for 5 min at 23"c), followed by centrifugation for 1 min at 27,000 x g. samples were analyzed by sds-page with coomassie brilliant blue r250 staining. the leftmost two lanes show 5 pg and 15 pg of the unfractionated crude lysate. the other lanes contain the indicated amounts of purified proompa. (rhoads et al., 1984) . after incubation at 40% to allow membrane assembly, samples were chilled on ice and assayed for translocated ompa by incubating the sealed membrane vesicles with proteinase k. each reaction mixture was subjected to sds-polyacrylamide gel electrophoresis (sds-page) and immunoblot analysis with antiserum to ompa ( figure 2 ). when 0.22 pg of proompa was assayed with 1.5 ng of membrane vesicles (figure 2, lane 7), approximately loo/o was translocated across the membrane and processed to mature ompa (compare with standard in lane 9). translocation was dependent on atp (see lanes 2, 4, 6, and 6; no atp). the total translocation increased with increasing amounts of proompa in the assay (figure 2, lanes 5 and 3) . at the highest levels of proompa, the translocation reached a plateau level, suggesting that this concentration of proompa was saturating an essential membrane element. to test this, varying amounts of proompa were mixed with a constant amount of [35s]proompa in 8 m urea, then diluted with membranes and assayed for translocation ( lo present a low background for the assembly assay, inner-membrane vesicles were prepared from e. coli jf699, a strain deficient in ompa. each translocation reaction contained 1.5 ug of membrane protein. the indicated amount of proompa in 1 ul of buffer c was diluted into an ice-cold translocation mixture (80 !i), allowed to translocate for 20 min at 4ooc, and assayed for translocation by protease inaccessibility (see experimental procedures). atp and nadh were omitted from the incubation in even-numbered lanes. samples were analyzed by sds-page and by immunoblot with antiserum to ompa. lane 9 is a standard of 0.021 ug of proompa. sivefy inhibits the translocation of proompa. the translocated proompa into these vesicles is thus both saturable and energy dependent. as a control, we determined that addition of various amounts of nonradioactive proompa to the assembly reactions after translocation of [ss]pro-ompa was complete had no effect on the protease protection assay ( figure 3 , lanes 5, 9, and 13). as expected, ompa translocates poorly or not at all under these reaction conditions (data not shown). purification of trigger factor we have previously reported indirect evidence that trigger factor forms a complex with proompa (crooke and wickner, 1987) . with large amounts of pure proompa now i 2 3 4 5 6 7 8 9 io ii i2 i3 14 15 the indicated amounts of purified proompa (in 1 ~1) were added to each incubation, either from the beginning of the translocation reaction (time 0) or after 20 min of the 23 min incubation. incubations were performed in either the presence, or absence of atp and nadh, as indicated. lane 15 is a standard of proompa. available, we covalently linked proompa to a cnb-activated sepharose column to create an affinity resin for the isolation of trigger factor. the resin was suspended in crude soluble protein extract (s40) from e. coli in the presence of 8 m urea, then dialyzed to remove the urea. these conditions were chosen to mimic those in which the trigger factor in s40 was shown to support renaturation of radiochemically pure [s5s]proompa (crooke and wickner, 1987). the dialyzed suspension was poured into a column, rinsed with buffer, and step-eluted with buffer containing 8 m urea. this affinity step resulted in a 23-fold purification of trigger factor with a 55% yield. further purification was achieved by successive ion exchange chromatography with s-sepharose fast flow cation exchange resin and mono q anion exchange fplc. the overall purification, in three steps, is 150-fold with a 54% yield (table 1 ; see experimental procedures for trigger factor activity assay). analysis of trigger factor at each step of the purification by sds-page and silver staining ( figure 4a ) shows that the purified protein (lane 5) is a single polypeptide of apparent size 63,000 daltons. it was recovered as a single, sharp, symmetric peak on the final step of ion exchange chromatography ( figure 48 ). the 63,000 dalton polypeptide eluted at fraction 28 ( figure 4c ), coincident, with the trigger factor activity of promoting proompa translocation into membrane vesicles ( figure 4d ). while minor bands of lower molecular weight can be detected, these polypeptides are not consistently seen (see, for example, figure 3a of lill et al., 1988) and do not form a complex with proompa (see figure 7 ). gel filtration analysis shows that the prominent polypeptide ( figure 4c ) and trigger factor activity co-elute with an apparent size of 73,000 daltons ( figure 5 ), indicating that trigger factor is a monomeric protein. automated edman sequence analysis showed that the amino terminus of trigger factor has the sequence met-gin-val-ser-val-giu-thrthr-gin-gly-leu-gly-. this sequence does not correspond to any of the published set or p&encoded proteins, to heat shock proteins dnak (bardwell and craig, 1984) or c62.5 (bardwell and craig, 1987) , or to groel (hemmingsen et al., 1988) . it does not correspond to any protein currently listed in the 1987 update of the psq data base (dayhoff, 1979) , which currently lists approximately 500 e. coli proteins. of purified trigger factor aliquots of purified trigger factor were incubated with proompa-sepharose and ompa-sepharose and as-sayed for binding. trigger factor was also incubated with sepharose with covalently bound ml3 procoat leader peptide, a leader typical in structure (von heijne, 1983 ) that functions in vivo to promote ompa secretion (kuhn et al., 1987) . trigger factor bound to proompa-sepharose but not to the immobilized ompa or leader peptide (e. c., unpublished observation), suggesting either that the trigger factor recognizes features of the leader as well as the mature protein or that proper recognition of a domain of proompa requires the presence of the intact precursor protein. we have previously reported that partially purified trigger factor will stabilize proompa in a conformation competent for membrane assembly (crooke et al., 1988) . to establish that this is a feature of the purified protein, and to characterize its action, proompa was diluted from 8 m to 0.8 m urea and assayed for translocation. immediately after dilution, proompa is competent for translocation ( figure 6 , lane 3) and the efficiency of this translocation is not affected by trigger factor (lane 4). however, pro-ompa lost competence for membrane insertion after a 3 hr preincubation at 21% (figure 6, lane 5). as previously reported (crooke et al., 1988) proompa preincubated in the presence of trigger factor retains full activity for assembly into inverted plasma membrane vesicles from e. coli ( figure 6, lane 7) . addition of trigger factor to proompa that had already misfolded during preincubation at 21% did not restore its competence for assembly (lane 9) suggesting that trigger factor stabilizes the competent form of proompa rather than refolding the incompetent protein. since proompa can bind directly to membrane vesicles in the absence of trigger factor, we determined whether membranes would stabilize the translocation-competent form of the precursor protein. pro-ompa preincubated at 21°c with membrane vesicles lost competence for subsequent membrane translocation ( figure 6 , lane 11) while preincubation in the presence of membranes and trigger factor, but without atp and nadh (lane 12), stabilized the assembly-active precursor. trigger factor and proompa form a 1:l complex that can be isolated by gel filtration. proompa and trigger factor were mixed in 8 m urea, dialyzed to remove the urea and allow complex formation, and size fractionated on a superose 12 fplc column (figure 7 ). the complex emerged as a single, symmetric peak ( figure 7 , line c). in the absence of trigger factor, proompa aggregates and emerges in the void volume (figure 7, tent with a 1:l 8tiochiometry of proompa (molecular mass 37,563 daltons; chen et al., 1980) and trigger factor (apparent molecular mass 73,000 daltons as judged by gel filtration). a 1:l complex can also be demonstrated by covalent cross-linking with dimethyl suberimidate (data not shown). the purified complex was stable to storage and was fully active for membrane assembly (s. l., unpublished observations). further studies will be needed to establish the structural features of this complex and the parameters of its formation and dissocation. srp stabilizes proompa for membrane translocation like trigger factor, canine srp ha8 been shown to recognize presecretory proteins, although until now its activity has been assayed in the context of a polysome. we therefore asked whether dog pancreas microsomes might contain a trigger factor activity. proompa was diluted from urea and preincubated at 21% with either a salt extract of dog pancreas microsomes or with salt extract buffer alone. after various preincubation times, the proompa was assayed for assembly into e. coli plasma membrane vesicles. the microsomal extract stabilizes proompa for membrane insertion (figure 8) , although not as well as (lane 4) of trigger factor (100 ng). samples in lanes 5-12 were preincubated at 21°c for 3 hr before addition of membrane vesicles and assaying of membrane assembly. the samples in lanes 5 and 6 were preincubated in the absence of trigger factor (lane 6 also lacked atp and nadh during the preincubation). samples in lanes 7 and 8 were preincubated in the presence of trigger factor (lane 8 had no atp or nadh during preincubation or during membrane assembly). the sample in lane 9 was preincubated for 3 hr without trigger factor, then mixed with trigger factor and assayed for membrane assembly. samples in lane 11 and 12 were preincubated in the presence of membrane vesicles for 3 hr at 2w but without atp or nadh; after the 3 hr preincubation, atp and nadh were added and samples were assayed for translocation. the sample in lane 12 had trigger factor present during preincubation. lane 1 contains a pro-ompa marker. lanes 2 and io had no sample. trigger factor (figure 6 ; see also crooke et al., 1968) . this may in part reflect the instability of srp itself in the absence of detergent (walter and blobel, 1980) . to stabilize the srp and to allow its purification, we added 0.01% nikkol to our buffers and isolated srp by published procedures (walter and blobel, 1980) . nikko1 does not inhibit the membrane assembly of proompa that has already formed a complex with trigger factor (figure 9 , lane 3, no detergent; lane 4, detergent added), and thus has no effect on the membrane translocation "machinery" per se or on the protease protection assay. however, it strongly inhibits the membrane assembly of proompa in the absence of stabilizing trigger factor ( figures 1oa and 106, lanes 3) . srp and the activity of stabilizing proompa copurified by either w-aminopentyl agarose chromatography or sucrose gradient sedimentation. srp prepared by each of these methods ( figure 10 ) stabilizes proompa against the detergent denaturation (lanes 4 and 5) and allow8 a level of assembly comparable to that seen in the absence of detergent (lanes 8). this srp-dependent membrane assembly requires atp (lanes 6 and 7, no atp). to assess the relative potency of trigger factor and srp in stabilizing proompa, comparable molar amounts of each were assayed ( figure 1oc ). while the trigger factor is clearly somewhat superior in stabilizing proompa, this is hardly surprising since these proteins evolved in the same organism, while srp was isolated from canine pancreas. isolation of a trigger factor-proompa complex by gel filtration proompa (300 ng; 25 mglml in buffer c), trigger factor (100 ng), or a mixture of the two was diluted to 250 pi in buffer c and dialyzed overnight against 1.0 liter of 66 mm tris-hci (ph 7.6) 25 mm nh&i, 10 mm mg acetate, 0.5 mm f3-mercaptoethanol. aliquots (200 nl) were applied to a 25 ml superose 12 fplc column (equilibrated in the dialysis buffer) and eluted at a flow rate of 0.25 ml/min. fractions (250 ~1) were collected, and 5 pl aliquots of each were analyzed by sds-page and silver staining. the azsc tracing and corresponding gel analysis are shown for trigger factor (a) proompa (b) and trigger factor-proompa complex (c). v, indicates the column void volume. the membrane assembly reaction of proompa with trigger factor and inner-membrane vesicles requires atp, the membrane potential, and the leader sequence of proompa. it therefore represents an authentic reconstitution of translocation. the proompa is translocated across an intact membrane and, as in vivo, it is processed to ompa after translocation. it is likely that other soluble proteins, such as those encoded by the seca and se& genes, would bind to our preparations of plasma membrane vesicles or, if added to the in vitro translocation in the absence (lane 2) or presence (lanes 3 and 4) of trigger factor (100 ng). samples were maintained at 21°c for 3 hr. membrane vesicles were added to each reaction, and samples were incubated at 40°c for 20 min. buffer j (3 pl, contains 0.01% nikkol) was added to the sample in lane 4 immediately before addition of membranes. translocation was assayed by protease protection. the marker in lane 1 represents 33% translocation efficiency. reaction, would enhance either the rate or extent of the assembly reaction. the secb protein (collier et al., 1988) may have the same function as trigger factor, with different, but overlapping, substrate specificity. further experiments with these proteins will be essential to settle these questions. the purification of proompa has provided several insights into the mechanisms of its assembly across the e. coli plasma membrane. it can spontaneously fold into a form competent for membrane assembly (crooke et al., 1988) . atp is the only other essential soluble component. in the absence of other proteins, proompa loses this competence (crooke et al., 1988) . at physiological temperatures, proompa misfolds within minutes (e. c., unpublished observations), suggesting a physiological role for trigger factor in stabilizing proompa by forming a stoichiometric complex. the isolation of these proteins and the demonstration of their activity in translocation across a membrane raise several important questions. perhaps foremost is the question of whether trigger factor has a role in vivo similar to that deduced from its action in vitro. this may be addressed by isolating the gene for trigger factor and establishing control of its synthesis in the cell. a second question is whether trigger factor is related to the heat shock, atp-dependent unfolding activities required for protein translocation into endoplasmic reticulum and mitochondria in yeast (deshaies et al., 1988; chirico et al., 1988) . we note, however, that the cellular concentragon of trigger factor is not affected by heat shock (e. c. and s. l., unpublished observations). finally, it will be important to understand how trigger factor is released from proompa as the proompa crosses the membrane. the availability of milligram quantities of pure trigger factor and proompa should facilitate studies of these questions. a close analogy can be drawn between trigger factor and the sri? either may associate with both the ribosome a. 3-7) . each sample in lanes 3-7 contained the equivalent of 3 ~1 of buffer j. atp was omitted where indicated. membranes were immediately added, and samples were incubated for 20 min at 40°c. purified srp (in 3 pi of buffer j) was added to the sample in lane 8 after the translocation reaction. all samples (lanes 3-8) were assayed for translocation by incubation with proteinase k. the marker in lane 1 represents 10% translocation; lane 2 contains no sample. srp was purified from canine microsomal salt extract by elution from an o-aminopentyl agarose column (a) or by sucrose gradient centrifugation (6). (c) [35s]proompa was renatured by dilution into translocation mixtures (without membranes) that contained the indicated amounts of purified trigger factor or o-aminopentyl agarose-isolated sri? membrane vesicles were immediately added to reactions that contained 0.0003% nikko1 and srp. the reactions were assayed for membrane assembly. reactions that contained trigger factor were preincubated for 3 hr at 2yc before addition of membrane vesicles and assaying of membrane assembly. protease-protected [35s]ompa was analyzed by sds-page. the resulting fluorography bands were quantitated by densitometric scanning. (walter and blobel, 1981; lill et al., 1988) and with proteins bearing a leader sequence. each can allow a presecretory or membrane protein to translocate across a membrane. while the interaction of srp with precursor proteins was initially thought to be strictly coupled to an early stage of polypeptide chain growth (walter and blobel, 1981) , recent studies have revealed that it can promote the translocation of nearly full-length polypeptides (rottier et al., 1985; ainger and meyer, 1988; perara et al., 1988) . we have found that srp can interact with purified proompa to stabilize it in a form competent for membrane transit. this raises the question of whether srp also functions in vivo to stabilize presecretory and membrane proteins in a conformation that allows transit across the endoplasmic reticulum. the subunit structures of srp and trigger factor are, of course, completely different. srp consists of six polypeptides and a 7s rna, while trigger factor is simply a monomeric protein. this difference in structural complexity is reminiscent of the leader peptidases of these two organisms: the e. coli enzyme is a single polypeptide (wolfe et al., 1983) , while that from dog pancreas has six polypeptide subunits (evans et al., 1988) . in addition to its affinity for leader sequences and ribosomes, srp has a specific membrane receptor (meyer et al., 1982) . studies in the accompanying paper (lill et al., 1988) suggest that trigger factor may also bind to a specific site on the plasma membrane. our observations lead to a working model of the cyclic action of trigger factor in promoting preprotein translocation (lill et al., 1988 sloo (gold and schweiger, 197i) , and s40 were prepared from e. coli strain dlo (ma-70 &al spoj7 met81) grown in l-broth at 37%. inverted inner-membranevesicles were prepared from dlo (rhoads et al., 1984) and, where indicated, jf899 @foc24 omp-a252 his-53 pure41 i/v-277 mer-65 lacy29 xyl-74 rpsl97 cyc41 q&2 tsx-63 k-). proompa was isolated from wild-type e. coli w3110 carrying the ptrc-omp9 plasmid. cells for s40, sloo, ribosomes, and proompa were harvested from a 150 liter fermenter using a sharples centrifuge. cell pastes were suspended in an equal weight of 50 mm tris-hci (ph 7.5), 10% (wvvol) sucrose, then frozen as small nuggets by rapid pipetting into liquid nitrogen and stored frozen at -80% (wickner et al., 1972) . preparation of proompa-sepharose resin proompa (475 mg of the washed pellet, figure 1) nuggets of frozen dlo cell suspension (300 g; am = 1 .o at harvest) were thawed without warming. lysozyme (8 ml, 10 mglml) was added with gentle stirring. the suspension was immediately poured into centrifuge tubes and maintained on ice for 30 min. tubes were heated at 37% for 3 min; care was taken not to subject the sample to hydrodynamic shear forces. the lysate was centrifuged for 60 min at 41,000 x g at 0%. the clear amber supernatant (540) was pooled. step ii, affinity chromatography s40 (1 io ml) was made 8 m in urea by the addition of 66 g of urea. this was mixed with 110 ml of settled sepharose cl-4b resin with covalently attached proompa, equilibrated in 50 mm tris-hci (ph 8.0), 2 mm dtt, 8 m urea (buffer c). the s40 and resin slurry was dialyzed at 2% with vigorous stirring against three 6 liter portions of 66 mm tris-hci (ph 7.6), 25 mm nh&i, io mm mg acetate, 2 mm dtt (buffer b). the resin was harvested by low-speed centrifugation (10 min at 150 x g). the supernatant was removed, and the resin was resuspended in 100 ml of fresh buffer band poured into a 2.5 x 22.5 cm column. unbound proteins were eluted with 600 ml of buffer b. bound proteins were eluted with buffer c. step iii, ssepharose fast flow step ii trigger factor (18 ml) was dialyzed at 2°c against two 1 liter portions of 50 mm sodium phosphate (ph 6.0), 1 mm dtt, 10% (vollvol) glycerol (buffer d). the dialyzed sample was applied to a 15 ml (1.5 x 9.0 cm) s-sepharose fast flow column equilibrated in buffer d. the column was washed with 75 ml of buffer d. trigger factor was eluted with 0.5 m nacl in buffer d. step iv, mono cl fplc the pool (8 ml) of step iii trigger factor was dialyzed against two 500 ml portions of 50 mm tris-hci (ph 8.0), 1 mm dtt, 10% (vollvol) glycerol (buffer e). the dialyzed sample was applied to an fplc mono q io/10 column (approximately 8 ml) equilibrated in buffer e. the column was washed with 40 ml of buffer e. trigger factor was eluted with a o-o.5 m nacl linear gradient (120 ml) in buffer e. trigger factor eluted at approximately 275 mm naci. isolation of trigger factor-free [35s]proompa [%]proompa was synthesized in a cell-free synthesis reaction (4.0 ml, 10 mci %-translabel) as previously described (bacallao et al., 1986) . sds (0.25%) was added following the synthesis reaction. the sample was heated at 100°c for 10 min, then cooled to 21%. triton x-100(15% in 2.5 ml of buffer b) was added to 5.7% final concentration. $%]proompa was immediately isolated on a 15 ml proompa affinity column as previously described (crooke and wickner, 1987) . in vitm translocation of pmompa renatured by dilution proompa (6 ~1, 100,000 cpm) in 8 m urea was diluted into 60 vi translocation reaction mixtures as previously described (crooke et al., 1988) . translocation of proompa into e. coli inverted inner-membrane vesicles was assayed by accessibility to added protease (bacallao et al., 1988) . trigger factor activity trigger factor was assayed for its ability to maintain proompa in a translocation-competent state following renaturation by dialysis (crooke and wickner, 1987) . for each sample, a 2 ~1 aliquot was mixed with trigger factor-free ]%]proompa (100,000 cpm in 98 ~1 of buffer c with 300 pglml bovine serum albumin). each mixture was dialyzed against two 100 ml portions of buffer b at 2% for 8 hr. aliquots of the dialyzed samples were added to translocation reaction mixtures (60 ~1) that contained [%]proompa, 50 mm tris-hci (ph 7.6), 37 mm kci, 16 mm nh&i, 8.2 mm mg acetate, 1.4 mm dtt, 1 mm spermidine chloride, 8 mm putrescine chloride, 330 wg of protein per ml of inverted inner-membrane vesicles, and (unless indicated) 1 mm atp and 5 mm nadh. translocaiton was assayed by the inaccessibility of proompa and ompa to added protease (bacallao et al., 1986) . one unit (table 1 ) of activity is defined as the amount of trigger factor that causes 10% of the added proompa to be inaccessible to protease following translocation. of mlcmsomal extract and purification of srp a salt extract of dog pancreas microsomes was prepared, and srp was purified following published procedures (walter and blobel, 1980) with the following minor modifications: buffer h (50 mm triethanolamine [ph 751, 500 mm kci, 5 mm mgcls, and 1 mm dtt) was used for the extraction of microsomes. srp was purified by chromatography on w-aminopentyl agarose using buffer h as the loading buffer and buffer j (50 mm triethanolamine [ph 7.51, 1 m kci, 5 mm mgcls, 1 mm dtt, and 0.01% nikkol) as the elution buffer. alternatively, purification was achieved by ultracentrifugation on a 50/b-20% linear sucrose gradient in buffer h containing 0.01% nikkol. proompa purification and translocation were analyzed by sds-page as described by ito et al. (1980) . trigger factor and ribosome protein profiles were analyzed by sds-page using 15% acrylamide, 0.12% n,n'-methylene his-acrylamide, and 6 m urea (ito et al., 1981) . silver staining was performed by the method of ansorge (1983) . radiolabeled proteins were visualized by fluorography (chamberlin, 1979) . immunoblotting was according to the method of towbin et al. (1979) . we thank marilyn rice and douglas geissert for expert assistance, david eisenberg for computer searches, d. meyer for stimulating discussions and generous gifts of microsomal salt extract and purified srp, and peter walter for generous gifts of srp this work was sup ported by a grant from the nih and a contract with eli lilly and co. b. g. is supported by usphs fellowship gmo7i04-12 (predoctoral training in genetic mechanisms at ucla). s. l. is a fellow of the medical scientist training program. r. l. is a fellow of the deutsche forschungsgemeinschaft. the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 18 usc. section 1734 solely to indicate this fact. received may 4, 1988; revised july 7, 1988. translocation of nascent secretory proteins across membranes can occur late in translation electrophoresis '82: advanced methods, biochemical and clinical applications, d. stathakos, ed the secy protein can act post-translationally to promote bacterial protein export major heat shock gene of drosophila and the fschefichia co/i heat-inducible dnak gene are homologous eukaryotic m, 83,000 heat shock protein has a homologue in fscherichia co/i intracellular protein topogenesis. proc. natl. acad fluorographic detection of radioactivity in polyacrylamide gels with the water-soluble fluor, sodium salicylate atp is essential for protein translocation into fscherichia co/i membrane vesicles primary structure of major outer membrane protein ii* (ompa protein) of fscheriobia co/i k-12 70k heat shock related proteins stimulate protein translocation into microsomes the antifolding activity of secb promotes the export of the e. coli maltose-binding protein fer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications patterns of amino acids near signal-sequence cleavage sites purification of a membrane-associated protein complex required for protein translocation across the endoplasmic reticulum signal recognition protein (srp) mediates the selective binding to microsomal membranes of in-vitroassembled polysomes synthesizing secretory protein rna synthesis initiates in vitro conversion of ml3 dna to its replicative from multiple mechanisms of protein insertion into and across membranes sequence of the leader peptidase gene of escherichia coliand the orientation of leader peptidase in the bacterial envelope effects of two set genes on protein assembly into the plasma membrane of escherichia co/i crooke, e., and wickner, w. (1987) . trigger factor: a soluble protein which folds pro-ompa into a membrane assembly competent form. proc. natl. acad. sci. usa 84, 5216-5220.crooke, e., brundage, l., rice, m., and wickner, w. (1988) evans, e. a., gilmore, a., and blobel, g. (1986) . purification of microsomal signal peptidase as a complex. proc. natl. acad. sci. usa 83, 581-585. kuhn, a., kreil, g., and wickner, w. (1987) . recombinant formsof ml3 procoat with an ompa leader sequence or a large carboxy-terminal extension retain their independence of secy function. embo j. 8, 501-505.lilt r., crooke, e., guthrie, b., and wickner, w. (1988) . the "trigger factor cycle" includes ribosomes, presecretory proteins, and the plasma membrane.cell 54, this issue.meyer, d. i., krause, e., and dobberstein, b. (1982) . secretory protein translocation across membranes: the role of the 'docking protein: nature 297, 647-650.miiller, m., and blobel, g. (1984 key: cord-259681-k9cnikqk authors: johnson, david c.; spear, patricia g. title: o-linked oligosaccharides are acquired by herpes simplex virus glycoproteins in the golgi apparatus date: 1983-03-31 journal: cell doi: 10.1016/0092-8674(83)90083-1 sha: doc_id: 259681 cord_uid: k9cnikqk abstract the o-linked oligosaccharides on mature forms of herpes simplex virus type 1 (hsv1) glycoproteins were characterized, and were found to account largely for the lower electrophoretic mobilities of these forms relative to the mobilities of immature forms. other posttranslational modifications of hsv1 glycoproteins (designated gb, gc, gd and ge) were related temporally to the discrete shifts in electrophoretic mobilities that signal acquisition of the o-linked oligosaccharides. fatty acid acylation (principally of ge) could be detected just prior to the shifts, whereas conversion of high-mannosetype n-linked oligosaccharides to the complex type occurred coincident with the shifts. the addition of o-linked oligosaccharides did not occur in cells treated with the ionophore monensin or in a ricinresistant cell line defective in the processing of n-linked oligosaccharides. we conclude that extension of o-linked oligosaccharide chains on hsv1 glycoproteins, and probably also attachment of the first o-linked sugars, occurs as a late posttranslational modification in the golgi apparatus. are acquired by herpes simplex virus glycoproteins in the golgi apparatus david c. johnson and patricia g. spear the department of microbiology the university of chicago chicago, illinois 60637 the o-linked oligosaccharides on mature forms of herpes simplex virus type 1 (hsvl) glycoproteins were characterized, and were found to account largely for the lower electrophoretic mobilities of these forms relative to the mobilities of immature forms. other posttranslational modifications of hsvl glycoproteins (designated gb, gc, gd and ge) were related temporally to the discrete shifts in electrophoretic mobilities that signal acquisition of the o-linked oligosaccharides. fatty acid acylation (principally of ge) could be detected just prior to the shifts, whereas conversion of high-mannosetype n-linked oligosaccharides to the complex type occurred coincident with the shifts. the addition of o-linked oligosaccharides did not occur in cells treated with the ionophore monensin or in a ricinresistant cell line defective in the processing of nlinked oligosaccharides. we conclude that extension of o-linked oligosaccharide chains on hsvi glycoproteins, and probably also attachment of the first o-linked sugars, occurs as a late posttranslational modification in the golgi apparatus. considerable attention has been focused on defining the intracellular location and functional significance of various steps in the synthesis and processing of membrane-bound and secreted glycoproteins. the site of synthesis of most glycoproteins is on membranebound ribosomes of the rough endoplasmic reticulum, where high-mannose-type oligosaccharides are transferred en bloc from dolichol phosphate donors to asparagine residues on nascent chains (struck and lennarz, 1980) and small hydrophobic leader sequences are removed (blobel et al., 1979) . the polypeptides are then transported by unknown mechanisms, perhaps involving clathrin-coated vesicles (rothman et al., 1980) to the golgi apparatus, where fatty acid molecules may be attached (schmidt and schlesinger, 1980; dunphy et al., 1981) and highmannose asparagine-linked (n-linked) oligosaccharides are processed to complex-type oligosaccharides (tabas and kornfeld, 1979; bretz et al., 1980; bennett and o'shaughnessy, 1981; roth and berger, 1982) . from the golgi apparatus the glycoproteins move to" the cell surface or are targeted to cellular organelles such as the lysosomes (neufeld and ashwell, 1980) . much less is known about the structure, assembly and attachment of o-linked oligosaccharides to gly-coproteins than is known for n-linked oligosaccharides. the occurrence of o-linked oligosaccharides was first recognized in glycoproteins from mucous secretions (carlson, 1968 ). subsequently, glycoproteins such as fetuin (spiro and bhoyroo, 1974) human chorionic gonadotropin (bahl, 1969) and erythrocyte glycophorin (thomas and winzler, 1969) were found to contain both o-linked and n-linked oligosaccharides. the o-linked oligosaccharides were attached to serine or threonine via an n-acetylgalactosamine (galnac) residue. the structures of these oligosaccharide chains vary from disaccharides composed of galnac and sialic acid to large oligosaccharides of 20 sugar residues containing galnac, galactose, n-acetylglucosamine, fucose and sialic acid . several studies have addressed the question of the stage and location at which o-linked oligosaccharides are attached to glycoproteins during their intracellular transport. strous (1979) concluded that galnac could be added in oglycosidic linkage to growing polypeptides in the rough endoplasmic reticulum, whereas others (kim et al., 1971; ko and raghupathy, 1972; hanover et al., 1980) have found that enzymes capable of mediating such reactions are enriched in smooth endoplasmic reticulum and golgi membranes rather than in rough endoplasmic reticulum membranes. cells infected with herpes simplex virus provide a useful system for investigating the processes discussed above. previous studies have shown that glycoproteins specified by herpes simplex virus type 1 (hsvl) contain o-linked oligosaccharides (oloffson et al., 1981) as well as n-linked oligosaccharides (pizer et al., 1980; serafini-cessi and campadelli-fiume, 1981; person et al., 1982; wenske et al., 1982) . we have confirmed these findings, further characterized the o-linked oligosaccharides and shown that 3h-palmitate is incorporated into one of the hsvl glycoproteins. we have also shown that the relatively large difference in electrophoretic mobilities of mature and immature forms of each hsvl glycoprotein (spear, 1976; baucke and spear, 1979; eisenberg et al., 1979; eberle and courtney, 1980) is due primarily to the presence of o-linked oligosaccharide chains on the mature forms. using the discrete shifts in electrophoretic mobilities that occur during posttranslational processing of these glycoproteins as a marker for the acquisition of o-linked oligosaccharides, we have investigated the temporal order in which other posttranslational modifications (fatty acid acylation and processing of n-linked oligosaccharides) occur relative to these shifts. these studies were carried out not only under conditions designed to allow normal glycoprotein processing, but also in the presence of monensin, an ionophore known to cause accumulation of secreted and membrane proteins and virions in golgi-derived vacuoles (tartakoff and vassalli, 1978; uchida et al., 1979; johnson and schlesinger, 1980; johnson and spear, 1982) , and in ricin-resistant cells defective in the processing of n-linked oligosaccharides. our results indicate that extension of o-linked oligosaccharides to yield chains of sufficient number or size to affect electrophoretic mobilities of the glycoproteins, and probably also addition of the first amino acid-linked galnac residues, occur in the golgi apparatus subsequent to addition of fatty acid and coincident with the processing of high-mannose-type nlinked oligosaccharides to complex-type oligosaccharides. released from hsvl glycoproteins by mild alkaline borohydride hsvi glycoproteins were labeled with "'c-glucosamine in infected cells. glucosamine can be converted to galactosamine and sialic acid, all of which are incorporated into oligosaccharides in eucaryotic cells (kornfeld and ginsberg, 1966; oloffson et al., 1981) . the "c-labeled hsvl glycoproteins gc and gd were isolated on preparative sds-polyacrylamide gels and treated with 0.05 m naoh, 1 m nabha at 45°c for 14-20 hr (mild alkaline borohydride), conditions shown to selectively release o-linked oligosaccharides (spiro, 1966; marshall and neuberger, 1977) . the mixtures of released oligosaccharides and glycoprotein were chromatographed on biorad p6 ( 40%-60% of the total label was observed as a peak at the void volume (v,), presumably due to n-linked oligosaccharides that remain attached to the glycoprotein. labeled material released from gc and gd and included in the p6 column was eluted principally in three peaks: peak i, containing material of approximately 2750 daltons; peak ii, of approximately 1800 daltons; and peak ill, which contained material as large as stachyose (666 daltons) and as small as glucosamine (215 daltons). a similar distribution of label was observed when preparations of hsvi gc isolated by radioimmunoprecipitation were examined (results not shown). no labeled material was included on the column if gc was chromatographed directly after purification or if treated with only 1 .o m nabh4 (results not shown). no oligosaccharides were released under these conditions from isolated vesicular stomatitis virus (vsv) g protein ( figure 10 , which contains only n-linked oligosaccharides (moyer et al., 1976) . two kinds of experiments were carried out to characterize further the gc oligosaccharides released by ,& elimination. first, the isolated glycoprotein was digested with exoglycosidases prior to treatment with alkaline borohydride. the elution profile of the oligosaccharides released from neuraminidase-treated gc ( figure 1e ) was markedly different from that of untreated gc ( figure 1d ). peaks i and ii we're absent, and a new peak of 14c-labeled material appeared, with a smaller apparent molecular weight than the glycopeptide of ovalbumin (1550 daltons). therefore, the oligosaccharides in peaks i and ii are highly sialated. further treatment of desialated gc with ,b-galactosidase produced only a very small shift in elution profile of the oligosaccharides released by alkaline borohydride (results not shown). second, gc was isolated from cells labeled with "c-galactose, instead of "c-glucosamine, and subjected to alkaline borohydride treatment. the elution profile, shown in figure 1 f, indicates that the oligosaccharides of peaks i, ii and ill all contain galactose. we observed a preferential labeling of the larger molecular weight material in peak ill, consistent with the idea that peak ill is composed of multiple oligosaccharides differing in molecular weight and that the larger molecular weight forms contain relatively more galactose. releases oligosaccharides from hsvl glycoproteins, resulting in decreased electrophoretic mobilities of the glycoproteins a preparation of the enzyme a-d-n-acetylgalactosamine oligosaccharidase (galnac oligosaccharidase) purified from clostridium perfringens has been found to release o-linked oligosaccharides from porcine submaxillary mucins by cleavage between serine or threonine residues and galnac (huang and aminoff, 1972; pomato and aminoff, 1978) . under appropriate conditions this enzyme released oligosaccharides from the mature form of hsvl gc, and the released material had an elution profile very similar to that of the oligosaccharides released by alkaline borohydride (figure 2a ). the distribution of label in the three peaks was somewhat different in the galnac oligosaccharidase digest, perhaps because the larger oligosaccharides were more resistant to release by the enzyme. it has been suggested that this enzyme is less active in removing sialated oligosaccharides than desialated oligosaccharides (n. pomato, personnal communication). galnac oligosaccharidase did not release 14cglucosamine-labeled oligosaccharides from vsv g protein under these conditions ( figure 2b ). in addition, treatment of vsv g protein with this enzyme did not affect the mobility of the glycoprotein on sdspolyacrylamide gels, as might have been observed if oligosaccharides or monosaccharides had been removed ( figure 3a ). hsvl glycoproteins were labeled with 35s-methionine in a pulse-chase experiment, and the mature and immature forms of the glycoproteins were precipitated with monoclonal antibodies prior to treatment with galnac oligosaccharidase. the results, presented in figures 3a and 3b, illustrate a phenomenon that has been reported previously (spear, 1976; baucke and spear, 1979; eisenberg et al., 1979; eberle and courtney, 1980 )-namely, that posttranslational processing of hsvl glycoproteins is accompanied by discrete shifts in their electrophoretic mobilities (to lower mobility). extracts prepared immediately after the 10 min pulse of 35s-methionine contained only the faster-migrating immature forms of the glycoproteins as labeled species, whereas extracts prepared after the 3 hr chase contained principally the slowly migrating mature forms. both forms were observed after 1 hr of chase, although the immature forms tended to be more abundant. the immature forms of the hsvl glycoproteins were insensitive to galnac oligosaccharidase (figures 3a and 38) . however, the mobilities of the mature forms of gb, gc, gd and ge (results not shown for ge) increased after treatment with galnac oligosaccharidase. in fact, the mobilities of the enzyme-treated, mature glycoproteins resembled closely those of their respective immature forms, suggesting that the shifts in mobility that occur during the maturation of these glycoproteins result primarily from the addition or extension of o-linked oligosaccharide chains as a posttranslational, rather than a cotranslational, modification. the shifts in electrophoretic mobilities of the glycoproteins occur some 20 min to 3 hr after the polypeptides are synthesized and partially glycosylated. it should be noted that the use of galnac oligosaccharidase at levels 10 times higher than used in these studies in the absence of ovalbumin produced shifts in the mobilities of vsv g protein and of the immature hsvl glycoproteins, suggesting the presence of contaminating exoglycosidic or endoglycosidic activities. vsv g protein and the immature forms of the hsvl glycoproteins served as internal controls to ensure that these contaminating activities had been diluted out or suppressed. effect of neuraminidase on electrophoretic mobilities of the hsvl glycoproteins to test the possibility that the addition of sialic acid residues to o-linked oligosaccharides is the principal cause of the decrease in electrophoretic mobilities associated with maturation of the hsvl glycoproteins, we treated immunoprecipitated glycoproteins labeled with 35s-methionine in a pulse-chase experiment with neuraminidase. the conditions used for enzymatic digestion were shown in a separate experiment to remove all of the "c-n-acetylmannosamine label incorporated into gc and gd (data not presented). the electrophoretic mobilities of the mature forms of 35smethionine-labeled gc and gd increased after neuraminidase treatment, but not to the extent observed after treatment with galnac oligosaccharidase ( figure 4 ). electrophoretic mobilities of the immature forms were not affected by neuraminidase. similar results were obtained in experiments carried out with gb and ge (results not shown) except that it was difficult to compare the very small shifts in mobility of gb obtained by use of either enzyme. with ge, however, there was only a slight increase in mobility after neuraminidase treatment, as was found for gc ( figure 4 ). it should be noted that neuraminidase removes sialic acid from both n-linked and o-linked oligosaccharides, so that the shifts in electrophoretic mobility observed overestimate the effect due to removal of sialic acid from o-linked oligosaccharides. hsvl proteins were labeled for 10 min with "smethionine, and then the proteins were immediately extracted (lanes p) or the label was chased for 1 hr (lanes cl) or 3 hr (lanes c2) before extraction. the hsvi glycoproteins gb, gc and gd were immunoprecipitated with monoclonal antibodies, eluted with 2% sds, 2% pmercaptoethanol and dialyzed against 0.1% sds. vsv proteins and the isolated hsvl glycoproteins were mixed with ovalbumin (1 mg/ml final concentration) and either not treated (lanes -) or treated (lanes +) with galnac oligosaccharidase for 2 hr at 37"c, prior to analysis by electrophoresis. hsvi proteins were labeled for 10 min with ?s-methionine, and then the proteins were immediately extracted (lanes p) or the label was chased for 3 hr (lanes c) before extraction. the hsvl glycoproteins gc and gd were immunoprecipitated with monoclonal antibodies, and the staphylococcus aureus complexes were washed twice with 0.1 m sodium acetate (ph 5.5), 1 mm cac12 and then incubated with no enzyme (lanes -1 or with neuraminidase (0.1 u/ml) for 2 hr at 37°c (lanes n). alternatively, the complexes were washed twice with 0.1 m sodium phosphate (ph 6.4). 0.1% sds, 1 mg/ml ovalbumin, and incubated with galnac oligosaccharidase (0.5 mu/ml) for 2 hr at 37°c (lanes 0). the mature forms after desialation retain sufficient monensin blocks the acquisition of o-linked o-linked oligosaccharide that they remain electropho-oligosaccharides retically differentiable from the immature forms. this we (johnson and spear, 1982) and others (wenske is consistent with our finding that desialated oligosac-et al., 1982) have shown that monensin prevents the charide released by alkaline borohydride treatment is posttranslational processing events that result in deof considerable size (figure 1 ). creased electrophoretic mobilities of the hsvl gly-coproteins. to determine whether the abnormal forms of glycoproteins that accumulate in the presence of monensin contain o-linked oligosaccharides, we carried out two experiments. first, gd was isolated from monensin-treated infected ceils and treated with mild alkaline borohydride, and the reaction mixture was chromatographed on p6. the results, shown in figure 5 , demonstrate that there were few if any oligosaccharides or monosaccharides released under these conditions. the second experiment was to determine whether galnac oligosaccharidase altered the electrophoretic mobilities of gd and gc isolated from monensin-treated cells by immunoprecipitation. the data in figure 6 illustrate our previous finding that monensin blocks the modifications of the glycoproteins responsible for decreased electrophoretic mobility, and show also that the glycoproteins accumulating in monensin-treated cells were resistant to the enzyme. taken together, these findings provide evidence that monensin blocks the attachment of o-linked oligosaccharides to herpes simplex virus glycoproteins and that these oligosaccharides are largely responsible for the shifts in electrophoretic mobilities normally observed. moreover, these posttranslational modifications must occur late in the intracellular transport of these glycoproteins to the cell surface, at some stage during or after they pass through the golgi apparatus. hsvi proteins were labeled with %-methionine in untreated (lanes untreated) or monensin-treated (lanes monensin) cells for 10 min. and then the proteins were immediately extracted (lanes p) or the label was chased for 3 hr (lanes c) before extraction. the glycoproteins gc and gd were isolated by immunoprecipitation with monoclonal antibodies, not treated (lanes -1 or treated (lanes +) with galnac oligosaccharidase and subjected to electrophoresis on sds-polyacrylamide gels. the mature forms of each of the glycoproteins are designated gc and gd and the immature forms are designated pgc and pgd. were labeled with %-methionine as described in the legend to figure 3 , and extracted immediately after the pulse (lanes p) or after chase periods of 1 hr (lanes cl) or 3 hr (lanes c2), followed by immunoprecipitation with monoclonal antibodies. the isolated glycoproteins were then treated with endo h (lanes +) or not treated (lanes -1 and electrophoresed on sds-polyacrylamide gels. coproteins gb, gd, gc and ge with endo h resulted in a marked increase in electrophoretic mobility on sdspolyacrylamide gels ( figure 7) . as the glycoproteins were processed to the forms with slower electrophoretie mobility (during the 3 hr chase period) they became resistant to endo h, as reported by others (serafini-cessi and campadelli-fiume, 1981; person et al., 1982; wenske et al., 1982) . we found, as did wenske et al. (1982) . that gb remained partially sensitive to endo h. therefore, conversion of the immature forms of hsvl glycoproteins to the mature forms parallels conversion of the endo h-sensitive form to an endo h-resistant form. it appears that the form of the glycoprotein that has o-linked oligosaccharides also has fully processed n-linked oligosaccharides. the covalent attachment of fatty acid molecules to viral glycoproteins has been described to occur as they pass through the golgi apparatus (schmidt et al., 1979; schmidt and schlesinger, 1980; dunphy et al., 1981) . of the four hsvl glycoproteins studied here, only ge, the igg fc receptor (baucke and spear, 1979) was found to incorporate significant quantities of label from 3h-palmitate ( figure 8 ). when cells were pulse-labeled for 6 min with 3h-palmitate, the immature forms of ge was preferentially labeled. therefore, the attachment of fatty acid, which is thought to occur early in the transport of viral glycoproteins through the golgi apparatus, appears to precede the attachment of o-linked oligosaccharides. the ricin-resistant cell line cl6, isolated by gottlieb and kornfeld (1976) feld, 1978) . we infected cl6 cells and the parental cell line l929 with hsvl to compare them with respect to the processing of viral glycoproteins. the immature forms of gb and gd were converted to the more slowly migrating, mature forms of gb and gd in l929 cells (figure 9 ) as had been observed in hep-2 cells. however, mature forms of gb and gd did not appear during the chase period in cl6 cells, nor were the glycoproteins detected in cl6 cells sensitive to galnac oligosaccharidase (results not shown). this result is consistent with the idea that hsvl glycopro-3h-palm 1 35s-met 6min 3h 1 'p c sle-p& figure 8 . attachment of fatty acid to hsvi ge hsvi -infected cells were labeled with 'h-palmitate (lanes %palm) for 6 min or 3 hr and then extracted with np40-dcc buffer. in parallel, monolayers were labeled for 10 min with ?s-methionine (lanes %-met), and the proteins were immediately extracted (lane p) or the label was chased for 3 hr (lane 0 before extraction for precipitation with monoclonal antibody and electrophoresis on sdspolyacrylamide gels. approximately 50 times more of the extract from cells labeled for 6 min with 'h-oalmitate was reauired to visualize the teins are not glycosylated at serine or threonine residues in this lectin-resistant cell line. our results confirm a previous report (oloffson et al., 1981) on the presence of o-linked oligosaccharides in hsvl glycoproteins, and demonstrate also that these o-linked oligosaccharides are largely responsible for the large differences in electrophoretic mobility between mature and immature forms of the glycoproteins. if the immature forms contain any o-linked carbohydrate, the number and size of these chains must be too small to affect the electrophoretic mobilities of the glycoproteins or the linkages must be resistant to galnac oligosaccharidase. results obtained from analysis of gd made in monensin-treated cells also suggest that the addition of galnac to serine and threonine residues in this polypeptide is not normally a cotranslational or early posttranslational event, unless monensin can block such a pre-golgi modification or unless insufficient radiolabel was incorporated into gd to permit detection of small monosaccharides or oligosaccharides released by alkaline borohydride. the effects of neuraminidase treatment on electrophoretic mobilities of the mature glycoproteins indicate that sialation of the o-linked oligosaccharides cannot account fully for processing-linked shifts in electrophoretic mobility, and therefore carbohydrate moieties other than sialic acid (that is, galactose, nacetylglucosamine) must account partially for the decreased electrophoretic mobilities (larger apparent size) associated with the presence of o-linked chains. we present evidence that the steps in synthesis of the o-linked oligosaccharides that result in decreased pge band relative to ge labeled with 3h-palmitate for 3 hr. in the golgi apparatus. first, the shifts in electrophoretie mobility occurred after addition of fatty acid to ge and coincident in time with the processing of nlinked oligosaccharides from high-mannose to complex-type oligosaccharides. both of these processes have been shown to occur in the golgi apparatus (schmidt and schlesinger, 1980; dunphy et al., 1981; tabas and kornfeld, 1979; bretz et al., 1980) . second, the ionophore monensin, which interferes with golgi function (tartakoff and vassalli, 1977, 1978) blocks both the processing of n-linked oligosaccharides on hsvl glycoproteins (wenske et al., 1982 ) and the addition of o-linked oligosaccharides. our conclusion about the intracellular site for addition of o-linked oligosaccharides is in agreement with findings that transferases catalyzing attachment of galnac to serine and/or threonine residues in rat intestinal mucosa (kim et al., 1971 ) and brain (ko and raghupathy, 1972) and hen oviduct (hanover et al., 1980) are localized in smooth membranes. in contrast, strous (1979) reported that galnac was attached to nascent epithelial polypeptides on polysomes. our data argue against attachment of o-linked oligosaccharides to nascent hsvl glycoproteins, although it is possible that a small amount of galnac is attached as a cotranslational modification and escaped detection. the finding that o-linked oligosaccharides were not added to hsvl glycoproteins in cl6 cells suggests either that enzymes necessary for this addition are defective in these cells, and are therefore of cellular genetic origin, or that attachment of o-linked oligosaccharides is contingent upon the processing of nlinked oligosaccharides. the cl6 cells fail to attach terminal sugars to n-linked oligosaccharides (gottlieb and kornfeld, 1976 ) processes known to occur in the gotgi apparatus (tabas and kornfeld, 1979; bretz et al., 1980; roth and berger, 1982) . it remains to be determined whether the same enzymes may participate in the addition of monosaccharides to both nlinked and o-linked oligosaccharide chains. experiments carried out in vitro have shown, however, that enzyme preparations capable of adding sugars to olinked oligosaccharides may not be able to attach these sugars to n-linked oligosaccharides (reviewed by schachter and roseman, 1980) . campadelli-fiume et al. (1982) recently reported that the conversion of high-mannose-type n-linked oligosaccharides on hsvl glycoproteins to complextype oligosaccharides is blocked in a ricin-resistant bhk cell line, and that infectious virions (containing immature glycoproteins) can be produced nonetheless. whether o-linked oligosaccharides were attached in the mutant bhk cell line was not discussed by these authors; they reported, however, the absence of shifts in electrophoretic mobility, which we have shown here to signal the addition of o-linked oligosaccharides. we did not investigate the infectiv-ities of virions produced in the mutant cl6 cells because, although mouse l cells are permissive for expression of most if not all herpes simplex virus genes, they are only semi-permissive for the production of infectious virus. the o-linked oligosaccharides constituted a major fraction of the labeled carbohydrate on hsvl glycoproteins and consisted principally of three size fractions. the largest of these oligosaccharides may be comparable in size and composition with blood-group substances described by feizi et al. (1971) . however, a large fraction of the '%-glucosamine and '%-galactose label released by mild alkaline borohydride from hsvl glycoproteins chromatographed as smaller molecular weight oligosaccharides similar to the disaccharides, trisaccharides and tetrasaccharides observed in submaxillary mucins (carlson, 1968; aminoff et al., 1979) fetuin (spiro and bhoyroo, 1974) . human iga (baenziger and kornfeld, 1974) and certain tumor-cell glycoproteins (bhavanandran and davidson, 1976) . differences in electrophoretic mobilities between mature and immature forms of the glycoproteins suggest that a large amount of o-linked oligosaccharide is added to some species (gc, for example), although the exact number of oligosaccharide chains cannot be estimated from these differences. it is of interest that label from inorganic sulfate can be incorporated into ge (hope et al., 1982) since olinked oligosaccharides of certain submaxillary gland mucins have been found to contain sulfated sugars (lombart and winzler, 1974) . a list of possible functions of o-linked oligosaccharides would include some of those previously proposed for n-linked oligosaccharides: altering the initial folding or final conformation of polypeptides (gibson et al., 1980) ; targeting of glycoproteins to specific intracellular organelles or the cell surface (neufeld and ashwell, 1980) ; protecting the glycoproteins from proteolytic degradation (schwartz et al., 1976) ; and influencing binding or other properties of the glycoproteins. we can probably rule out effects of o-linked oligosaccharides on initial folding of glycoproteins because they seem to be added as relatively late posttranslational modifications. in addition, functions of the hsvl glycoproteins essential for virion infectivity in cell culture do not depend on the presence of olinked oligosaccharides. this conclusion emerged from the results presented here coupled with our previous findings (johnson and spear, 1982 ) that infectious virions are produced in the presence of monensin and that these virions, which accumulate in intracytoplasmic vacuoles, contain immature glycoproteins. the results of this study suggested that the virion envelope, acquired at the inner nuclear membrane, initially contains immature glycoproteins that are processed to the mature forms as the virions are transported through the golgi apparatus. it seems unlikely that o-linked oligosaccharides play any role in the targeting of hsvl glycoproteins to the nuclear membrane, but they could conceivably play some role in the transport of viral glycoproteins or virions to the cell surface. also, addition of oligosaccharides to serine and threonine residues on cellular and viral glycoproteins, already glycosylated at asparagine residues, may lead to increased hydration of the cell or virion surface and increased protection from proteolytic degradation. o-linked oligosaccharides have also been reported on glycoproteins specified by vaccinia virus (shida and dales, 1981) and coronavirus (nieman and klenk, 1981) . these viruses share in common with hsvi the property of acquiring their envelopes at internal cellular membranes, rather than at the plasma membrane. immediately, and 199v containing a tenfold excess of unlabeled methionine was added to the others for 1 hr or 3 hr. cells were washed twice with pbs and scraped into pbs containing 1% np40, 0.5% sodium deoxycholate (doc) and ovalbumin at 1 mg/ml (np40-doc extraction buffer) if glycoproteins were to be immunoprecipitated or into 2% sds, 50 mm tris-hci (ph 6.8) if glycoproteins were to be purified by sds-polyacrylamide gel electrophoresis. cells were treated with 0.2 gm monensin (from 1 mm stocks in absolute ethanol) 2 hr after infection with hsvl (johnson and spear, 1982) . were purified by preparative electrophoresis on 1.5 mm thick, 8.5% polyacrylamide gels crosslinked with n,n'-diallyltartardiamide (datd) (heine et al., 1974) . sds extracts of 'c-glucosamine-or "c-galactose-labeled cells were sonicated and boiled for 5 min before being electrophoresed for 4.5 hr at 15 ma per gel. gels were immediately dried and subjected to autoradiography, and the autoradiographs were used to detect labeled glycoproteins. the labeled glycoprotein bands were excised from the dried gels, hydrated in 0.075 m tris-glycine, 0.2% sds, 10% glycerol, 0.05 mg/ ml ovalbumin, 0.05 mg/ml cytochrome c and electrophoresed into dialysis tubing (stephens, 1975) . the isolated glycoproteins were dialyzed against 0.1% sds and lyophilized. alternatively, hsvl glycoproteins were isolated by immunoprecipitation with monoclonal antibodies (l-144, specific for gb; 11-474, specific for gc; 11-436. specific for gd; or 11-481, specific for ge) as previously described (johnson and spear, 1982) and dialyzed against 0.1% sds for treatment with enzymes or eluted into 2% sds, 50 mm tris-hci (ph 6.8). 10% glycerol for electrophoresis on 8.5% polyacrylamide gels crosslinked with datd (heine et al., 1974) . analytical gels were impregnated with 2.5-diphenyloxazole (banner and laskey, 1974) and then dried and placed in contact with cronex medical x-ray film at -70°c. treatment of glycoproteins and gel filtration of oligosaccharides viral glycoproteins were incubated with 0.05 m naoh, 1 .o m nabh4 for 14-20 hr at 45°c as described by carlson (1968) . high concentrations of nabh4 were found to be necessary to prevent destruction of galnac linked to serine or threonine (carlson et al., 1970) . excess borohydride was destroyed by addition of 2 m hci to ph 6.5. the mixture was chromatographed on a column (95 x 1.2 cm) of biorad p6 (200-400 mesh) and eluted with 0.1 m nhihcoj, 0.1% sds. fractions were collected, and aliquots were dried on glass filters and counted. the column was calibrated with blue dextran 2000 (v,); the glycopeptides derived by pronase digestion of fetuin (3180 daltons) (spiro, 1962) and ovalbumin (1550 daltons) (spiro, 1966); stachyose; lactose; and n-acetylglucosamine (v,). digestions with galnac oligosaccharidase (bethesda research laboratories) were performed with enzyme at concentrations of 0.5 mu/ ml on isolated viral glycoproteins in the presence of ovalbumin (1 mg/ ml) and 0.1% sds in 0.1 m na2p04 (ph 6.4) for 2 hr at 37°c. the digestions were immediately stopped by addition of 2% sds and boiling for 5 min, and were loaded onto the p6 column or subjected to electrophoresis. isolated hsvl glycoproteins were treated with neuraminidase (type x; sigma) at 0.1 u/ml in 50 mm sodium acetate (ph 5.5) 1 mm caci, for 20 hr at 37°c. glycoproteins were treated with /3-galactosidase (type vii; sigma) at 100 u/ml in 200 mm na2p04 (ph 7.2) for 20 hr at 37°c. endo h (miles laboratories) was used at 35 mu/ml in 0.1 m sodium citrate buffer (ph 5.5) 0.1% sds. lmmunochemical studies on blood groups the role of oligosaccharides in glycoprotein biosynthesis isolation and characterization of two mouse l cell lines resistant to the toxic lectin ricin synthesis of n-and o-linked glycopeptides in oviduct membrane preparations proteins specified by herpes simplex virus. xii. the virion polypeptides of type 1 strains sulphated glycoproteins induced by herpes simplex virus enzymes that destroy blood group specificity. v. the oligosaccharidase of clostridium perfringens vesicular stomatitis virus and sindbis virus glycoprotein transport to the cell surface is inhibited by ionophores monensin inhibits the processing of herpes simplex virus glycoproteins, their transport to the cell surface and egress of virions from infected cells glycoprotein biosynthesis in small intestinal mucosa. i, a study of glycosyltransferases in microsomal subfractions glycoprotein biosynthesis in the developing rat brain. ii. microsomal galactosaminyltransferase utilizing endogenous and exogenous protein acceptors structure of glycoproteins and their oligosaccharide units the metabolism of glucosamine by tissue culture cells isolation and characterization of oligosaccharides from canine submaxillary gland mucins aspects of the structure and metabolism of glycoproteins oligosaccharide moieties of the glycoprotein of vesicular stomatitis virus carbohydrate recognition systems for receptor-mediated pinotytosis coronavirus glycoprotein el, a new type of viral glycoprotein 0-glycosidic carbohydrate-peptide linkages of herpes simplex virus glycoproteins glycoprotein processing in mutants of hsv-1 that induce cell fusion effect of tunicamycin on herpes simplex virus glycoproteins and infectious virus production a-d-n-acetylgalactosaminyl-oligosaccharidase of clostridiom perfringens lmmunocytochemical localization of galactosyltransferase in hela cells. co-distribution with thiamine pyrophosphatase in trans-golgi cisternae transport of the membrane glycoprotein of vesicular stomatitis virus to the cell surface in two stages by clathrin-coated vesicles mammalian glycosyltransferases. their role in the synthesis and function of complex carbohydrates and glycolipids relation of fatty acid attachment to the translation and maturation of vesicular stomatitis and sindbis virus membrane glycoproteins evidence for covalent attachment of fatty acids to sindbis virus glycoproteins suppression of glycoprotein formation of semliki forest virus, influenza and avian sarcoma virus by tunicamycin studies on benzhydrazone, a specific inhibitor of herpesvirus glycoprotein synthesis. size distribution of glycopeptides and endo+-n-acetylglucosaminidase h treatment we thank sondra schlesinger for the cl6 cells and for helpful discussions: g. campadelli-fiume and r. courtney for the communication of results prior to their publication;and mrs. valerie kohn for assistance in preparing this manuscript. this work was supported by grants from the american cancer society and the national institutes of health. d. c. j. is supported by a nato science fellowship, awarded by the natural sciences and engineering council of canada.the costs of publication of this article were defrayed in part by the payment of page charges.this article must therefore be hereby marked "advertisement" in accordance with 18 u.s.c. section 1734 solely to indicate this fact.received july 2, 1982; revised december 14, 1982 aminoff, d., baig, m. m. and gathmann, w. d. (1979) . glycoproteins and blood group activity. oligosaccharides of a+ hog submaxillary glycoproteins.j. biol. chem. 254, 1788-l 793.baenziger, j. and kornfeld. s. (1974) tabas, i. and kornfeld, s. (1978) .the synthesis of complex-type oligosaccharides.ii. identification of an a-o-mannosidase activity involved in a late stage of processing of complex-type oligosaccharides. j. biol. chem. 253, 7779-7786. tabas, i. and kornfeld, s. (1979 key: cord-253259-hmn7mg8j authors: shaw, a. l.; rothnagel, r.; chen, d.; ramig, r. f.; chiu, w.; prasad, b.v.venkataram title: three-dimensional visualization of the rotavirus hemagglutinin structure date: 1993-08-27 journal: cell doi: 10.1016/0092-8674(93)90516-s sha: doc_id: 253259 cord_uid: hmn7mg8j abstract three-dimensional structures of a native simian and reassortant rotavirus have been determined by electron cryomicroscopy and computer image processing. the structural features of the native virus confirm that the hemagglutinin spike is a dimer of vp4, substantiated by in vivo radiolabeling studies. exchange of native vp4 with a bovine strain equivalent results in a poorly infectious reassortant. no vp4 spikes are detected in the three-dimensional reconstruction of the reassortant. the difference map between the two structures reveals a novel large globular domain of vp4 buried within the virion that interacts extensively with the intermediate shell protein, vp6. our results suggest that assembly of vp4 precedes that of vp7, the major outer shell protein, and that vp4 may play an important role in the receptor recognition and budding process through the rough endoplasmic reticulum during virus maturation. rotavirus, a genus within the virus family reoviridae, is a leading cause of severe infantile gastroenteritis worldwide. the mature virion contains six structural proteins and 11 segments of double-stranded rna (kapikian and chanock, 1990; estes, 1990) . the three-dimensional structureof rotaviruswasfirstdetermined byelectroncryomicroscopy and computer image processing techniques (prasad et al., 1988) . subsequent studies provided additional evidence for the identification of topographical features of the viral structural proteins (prasad et al., 1990; yeager et al., 1990) . the mature rotavirion is -10ooa in diameter and exhibits t = 13 icosahedral symmetry. rotavirus is classically described as a double-shelled virus, consisting of an inner shell composed of the major structural protein, vp8, and an outer shell composed of two structural proteins, vp4 and vp7 (estes and cohen, 1989) . the inner shell surrounds the core, which encompasses the genome and is composed predominantly of vp2, as well as minor amounts of vp1 and vp3. , based on the radial density profile computed from the threedimensional density map of the mature rotavirion, have proposed that rotavirus is a triple-shelled structure with an inner vp2 shell, an intermediate vp6 shell, and an outer shell of vp4 and vp7. the vp2 shell lies between the radii of 210 and 265 a. the existence of this shell is further substantiated by self-assembly of baculovirusexpressed vp2 into shells with a radius of -265 a (labbe et al., 1991) . the knobby trimers of vp6 constitute the intermediate shell, between the radii of -265 and 350 a. the vp6 shell is bristly in appearance under the electron microscope and is perforated with 132 aqueous channels that lie on a t = 13 levo icosahedral lattice. vp7, a glycoprotein, forms a smooth outer shell of -30 a in thickness, between the radii of 350 and 380 a. this shell contains 132 aqueous channels that are in register with those in the vp6 shell. vp4, the minor of the two outer shell proteins, forms spikes that extend -120 a from the vp7 surface. these spikes are located at one edge of the peripentonal channels and have been suggested to be dimers based on binding of two monocional anti-vp4 fab fragments per spike (prasad et al., 1990) . it should be mentioned here that, consistent with the observations that rotavirus is a triple-shelled particle, we will refer to the mature particles as triple-shelled particles (previously described as double-shelled particles) and the particles that lack the outer shell as double-shelled particles (previously described as single-shelled particles). the spike protein, vp4, has been implicated in several roles during rotavirus infection. vp4 is the rotaviral hemagglutinin (kalica et al., 1983 ) is a determinant of virulence and growth in ceil culture (offit et al., 1986) , and appears to be involved in cell binding (ruggeri and greenberg, 1991; bass et al., 1991) and penetration (kaljot et al., 1988) . in addition, vp4 is susceptible to trypsin, resulting in the cleavage products vp5' (-60 kd) and vp8' (-28 kd). proteolysis of vp4 significantly enhances viral infectivity (espejo et al., 1981; estes et al., 1981) . vp4 may also play a role in the maturation of progeny rotavirions (maass and atkinson, 1990) , in which double-shelled particles bud through the endopiasmic reticulum and become transiently enveloped prior to acquisition of the outer shell proteins (estes, 1990) . to study the structural and functional aspects of vp4, we have exploited the ability of rotavirus to undergo reassot-tment. the functions of various viral proteins of the segmented genome viruses, notably of the reoviridae (ramigand ward, 1991) the orthomyxoviridae (ritchey et al., 1978) , and the bunyaviridae (endres et al., 1991; endres et al., 1989) have been elucidated by examining the phenotypic and genotypic variation between reassortants and their parental strains. we have chosen for our structural studies a well-characterized simian variant, sal 1-4f, and an sal 1-4f reassortant, r-004. r-004 is genetically identical to sa11-4f with the exception of the vp4 genome segment, which has been exchanged for the bovine strain 8223 equivalent. the exchange of vp4 results in a reassortant with significantly lowered stability and a different plaque morphology . in this paper, we present the structuresof the native sa114f and the reassortant r-004, computed to -28 a resolution by electron cryomicroscopy and computer image processing. the electron cryomicrographs of sal l-4f ( figure 1a ) and r-004 ( figure 1b) show the characteristic spoke-andwheel morphology unique to rotavirus. the virions, about 765 a in diameter excluding the spikes, exhibit a smooth outer margin. regionsof density corresponding to the vp4 spikes (indicated by arrows) project outward from the surface of several sal 1-4f rotavirions. these projections are not evident in the r-004 image. three-dimensional structures of sal l-4f and r-004 the three-dimensional structures of sa114f and r-004 were computed from the electron cryomicrographs in figures 1a and 1 b. because the reconstructions were computed from independent micrographs, corrections for magnification differences and for the contrast transfer function (ctf) were necessary before calculating a difference map between reconstructions (described in experimental procedures; see figure 2 ). these corrections ascertain that the differences between the reconstructions are structural and not due to the imaging conditions. unless otherwise mentioned, the three-dimensional structures shown are ctf corrected. figure 3a shows a surface representation of the the reconstructed density map of sai 1-4f reveals the complex organization of the outer shell of rotavirus. the spherical shell is composed of what appear to be trimers of vp7, arranged onto a t = 13 iwo icosahedral lattice. the spikes are the viral hemagglutinin and are composed of vp4. the dimeric nature of the vp4 spikes can be clearly observed from our -26 a resolution map. (8) the reassortant r-004 has no vp4 spikes, which provides structural justification for the biological and biochemical distinctions between reassortant and parent ). (c) the difference map belween sai i-4f and r-004 reconstructions reveals that the vp4 spike traverses the vp7 outer shell and interacts with the vp6 inner shell. (d) the surface representation of the r-004 vp6 shell merged with the difference map between r-004 and sal i-4f. the vp4 spikes emerge from the type ii channels in the vp6 shell. the globular interior portion of vp4 covers the channel up. sal 1-4f reconstruction viewed along the icosahedral 3-fold axis, computed to -28 a resolution based on the phase residual criterion of the common lines (crowther, 1971) . structural features observed in the outer surface and inside of sal l-4f are similar to those described for other strains of rotavirus (prasad et al., 1990; yeager et al., 1990) . the surface of the vp7 outer shell is noticeably rippled, with evidence for possible trimeric clustering of the vp7 molecules. the vp7 molecules are arranged on a t = 13 levo icosahedral lattice of -785 a in diameter. a total of 132 aqueous channels perforate the vp7 shell at three distinct locations. twelve type i channels are located along the icosahedral 5-fold axes. sixty type ii channels are located at the 6-coordinated positions immediately surrounding the 5-fold axes. the 60 remaining type ill channels are located at the other 6-coordinated positions that neighbor the icosahedral3-fold axes. the spikts, located at one edge of the type ii channels, are -120 a in length from the surface of the vp7 shell. there is distinct asymmetry in the shapes of the type ii and type ill channel openings, while the type i channel has a circular opening. the type i channel is significantly smaller than the other two types. figure 36 shows the surface representation of the r-004 figure 3c has been rotated to provide different angles for viewing. several observations are made: two molecules of vp4 appear to participate in spike formation, and vp4 appears to form multiple contacts with vp6. the dimeric nature of the vp4 hemagglutinin is most pronounced in the exterior vp4 portion, while the globular shape of the interior portion may allow the vp4 spike to make multiple contacts with the vp6 trimers that line the type ii channels. see text for an explanation of the white arrows and arrowheads. (b) a bottom view of the vp4 spike reveals that the globular domain that lies beneath the vp7 surface is hollow in its center. (c) a top view of the vp4 spike emphasizes the bilobed structure at its distal tip. white arrows point to the outermost edge of each lobe. reconstruction viewed down the icosahedral 3-fold axis, also calculated to -28 a resolution. in comparing the r-004 and sa11-4f reconstructions, it is obvious that r-004 has no vp4 spikes. we examined the threedimensional density maps at various contour levels and failed to detect the presence of any density that may be due to the spikes. otherwise, the outer surface structure in the sal l-4f and r-004 reconstructions showed strong similarity in vp7 morphology. to discover any other structural differences apart from the obvious lack of surface spikes, particularly beneath the vp7 surface, we computed a difference map between the parent and reassortant structures. the difference map between uncorrected reconstructions (data not shown) was noisy beneath the vp7 surface and difficult to interpret. the difference map between the ctf-corrected reconstructions ( figure 3c ) issignificantly less noisy. a large globular mass'becomes apparent that is located beneath the vp7 shell in register with each of the 80 vp4 spikes. this globular domain is associated with the exterior portion of each spike by two slender stalks. no other differences between the two structures can be detected. to determine the relative position of the inner globular domain with respect to the vp8 inner shell, we merged a truncated map of r-004 with the difference map ( figure 3d ). density values between the radii of 325 and 533 a in the r-004 reconstruction were removed to expose the structural organization of the vp8 inner shell. the tips of the vp8 trimers that interact with vp7 were consequently trimmed. in figure 3d it can be seen that the spikes also extend outward from the type ii channels in the vp8 inner shell, but the interior globular domains cover these channels to a large extent. the spike structure figure 4a shows one of the 80 equivalent vp4 spikes isolated from the difference map ( figure 3c ) and viewed at various angles of rotation about the longitudinal axis of the spike. we observe several recurring features in our rotation series: a bilobed structure at the distal tip of the spike (indicated by white arrows) between the radii of -480 and 505 a, with -70 a distance between the two lobes and -40 a width; a distinct separation in the body of the spike (120°) that widens to a maximum of -80 a; and two strands of density intertwining to form the spike, which associate at the base of the spike but diverge again toward the distal tip (white arrowheads). the bilobed structure of the spike is emphasized in the top view of the spike, shown in figure mass seen in all of the rotational views. when viewed from the inside out, as in figure 4b , it becomes apparent that this globular region is not a solid mass but instead is concave. [%]methionine in vivo radiolabeling studies 'to analyze the stochiometric proportions of vp4 and vp7 in the mature virion, we have carried out a35s-radiolabeling experiment. virus was uniformly labeled with [35s]methionine, digested with trypsin to cleave vp4 into vp5* and vp8*, and separated by 10% sds-polyacrylamide gel electrophoresis (sds-page) as described in experimental procedures. figure 5 shows an autoradiogram of uncleaved and trypsin-cleaved sal 1-4f proteins as separated by sds-page. the radioactivity contained in vp5* and vp7 (both bands) was quantitated by 8 scan. given that a virion contains 780 copies of vp7 (prasad et al., 1988; yeager et al., 1990) a virion contains 80 spikes, the vp5' fragment of sal l-4f vp4 contains 12 methionine residues (mattion and estes, 1991) , and the mature form of sa114f vp7 contains 8 methionine residues (stirzaker et al., 1987) the calculated vp5':vp7 ratios would be 1:8.7,1:4.3, and 1:2.9 if each spike contained a monomer, dimer, or trimer of vp4, respectively. the mean vp5*: vp7 ratio from four independent determinations is 15.0 ( -+ 0.14 standard deviation), in good agreement with the predicted ratio for each spike consisting of two molecules of vp4. the reassortant r-004, although biochemically distinct from sa114f, appears morphologically similar to its parent in the electron cryomicrographs. however, the threedimensional structure of the reassortant shows no vp4 spikes. three possibilities could explain the absence of vp4 in the r-004 structure. first, the heterologous vp4 may be unstable during assembly of the progeny virions and, during purification, become dissociated from the virus. second, the heterologous vp4 may be present on the r-004 progeny in stoichiometric proportions but be disordered so that no vp4 structure is detected after icosahedral averaging. third, r-004 may be assembled in vivo without any vp4. biochemical evidence indicates that the first possibility is correct. have shown that the instability of r-004 is largely a consequence of the interactions between heterologous (8223) vp4 and native (sal l-4f) vp7. unpurified r-004 can be neutralized by 2g4, a vpcspecific monoclonal antibody , suggesting that correctly folded vp4 is present on the r-004 particles. however, in the purified r-004 particles, a significantly decreased quantity of vp4 is seen by sds-page in comparison with sa114f. furthermore, particle-associated vp4 decreases (along with infectivity) as storage time of the purified r-004 increases when measured by sds-page and plaque assay (d. c. and r. f. r., personal communication). therefore, the most likely explanation for the loss of vp4 spikes from r-004 particles suggested by these results is that the heterologous vp4 becomes destabilized and falls off. difference map shows inward extension of the vp4 spike the difference map between the native and the reassortant structures reveals a large domain beneath the vp7 surface centered on the type ii channel in close association with the walls of the channel made of trimers of vp6. possible interpretations as to the identity of this globular mass include the following: first, it is the inward extension of the vp4 spike; second, it represents a conformational difference in vp6 between the native and the reassortant; or third, it is a new structural protein. the second interpretation is unlikely, since sall-4f and r-004 are distinct only with regard to genome segment 4 and its protein, vp4. it isunlikelythat the heterologousvp4in r-004would cause such a large conformational change in either vp6 or vp7 that they would contribute significantly to the observed globular mass. assuming a protein density of 1.30 g/cm3, the mass of the globular feature is about 60 kd. such significant mass displacement in vp6 between the native and the reassortant structures is improbable. furthermore, examination of the vp6 shell in the two structures shows no evidence of mass translocation in r-004. the third interpretation also is unlikely, as there is no evidence for a new structural protein by sds-page. the observed globular mass is clearly seen connected to the exterior of the vp4 spike. the difference between the two structures is localized only to this region, and the differences elsewhere are insignificant. therefore, the most plausible interpretation is the first interpretation, that is, that the large globular mass is a part of the vp4 spike. the rotaviral hemagglutinin is a dimer the structural details in the sall-4f reconstruction strongly indicate that the rotavirus hemagglutinin is a dimer of vp4. mass density calculations computed from the difference map are in complete agreement with 120 molecules of vp4 (assuming molecular mass is 88 kd). the dimeric state of the vp4 spike was initially suggested by prasad et al. (1990) from a low resolution (-35 a) structure of the simian rotavirus strain sall (clone 3) complexed with anti-vp4 fab fragments. these studies revealed two fab molecules bound to the bilobed structure of each spike. subsequent structural studies of yeager et al. (1990) at -37 a, also pointed out the bilobed feature of the spikes in another strain of rotavirus. these authors noted the discrepancies between the structural and biochemical results pertaining to the oligomeric state of vp4 and suggested the need for more experimental evidence. the dimeric nature of the vp4 spike is evident at the present resolution of 28 a. at this resolution, a large separation in the body of the spike is apparent, as well as the intertwining of the two strands of vp4. these features were not observed in previous reconstructions of rotavirus. the results of previous biochemical studies are not in agreement with any oligomeric structure for the vp4 spike. densitometric measurements from coomassie-stained polyactylamide gels of sall (liu et al., 1988 ) predict a vp4:vp7 ratio of 1:55. this ratio is much lower than that predicted for even a monomeric vp4 spike (1: 13). since coomassie blue is known to stain protein differentially, it is difficult to use this method for quantitation. our radiolabeling studies provide a more accurate measurement of the quantity of vp4 and vp7 in the mature rotavirion. the vp5':vp7 ratio that we have determined from highly purified (35s]methionine-radiolabeled virus is in excellent agreement with each spike being a dimer of vp4. it is not surprising that the rotavirus hemagglutinin is a vp4 dimer. viral hemagglutinins tend to function as multimeric structures. within the reoviridae, the reovirus hemagglutinin, 01 (weineretal., 1978; yeungetal., 1987) has been suggested to be a trimer strong et al., 1991) or tetramer (bassecduby et al., 1987; nibert et al., 1990; fraser et al,, 1990 ). the 01 protein modulates tissue tropism (lee et al., 1981) and is functionally similar to the rotavirus vp4. several more examples of multimeric hemagglutinins can be found in other virus families. the hemagglutinin of coronavirus, also an enteric virus, functions as a dimer (hogue et al., 1989) . the wellstudied influenza hemagglutinin is a trimer (wiley and skehel, 1987) . in addition, the hemagglutinin-neuraminidase of sendai virus, a member of the paramyxoviridae, has been isolated in both dimeric and tetrameric forms (laver et al., 1989) . in the vp4 spike, any e-fold axis of symmetry is not particularly evident. the location of the spike does not correspond to any local or strict 2-fold axis of the icosahedron. thus, the dimeric configuration is apparently asymmetric. the asymmetry of the spike can be seen clearly in the views provided by figure 4a . this result is consistent with the result of prasad et al. (1990) in which the two fab fragments bound to the vp4 spike at different angles. prasad et al. (1990) suggested that the fab fragments exhibited different elbow angles. it is very likely that the antigenic sites recognized by the fab fragments are not in structurally similar environments, resulting in different angles for fab binding. one pertinent question is, to what advantage is this asymmetry? it is possible that the asymmetry is required for stabilizing interactions between quasi-equivalent vp7 and vp8 molecules. on the outer virion surface it appears that vp4 interacts with 2 molecules of vp7, but inside it appears that the large globular domain interacts with all six of the vp6 molecules surrounding the type ii channel. however, it is possible that there are more specific interactions between the vp4 dimer and 2 of the 6 vp6 molecules. the presence of a large globular domain internal to the outer shell raises interesting questions regarding the assembly of vp4 in the rotavirus particles. during progeny maturation, newly assembled double-shelled particles migrate from the viroplasm to the rough endoplasmic reticulum, through which they bud, resulting in a transiently enveloped particle (estes et al., 1983) . the steps by which double-shelled particles become triple-shelled are poorly understood. one pending question is, at what stage does vp4 assembly take place, before or after the assembly of vp7? our results favor the hypothesis that vp4 assembly precedes that of vp7and, thus, the budding event. it would be difficult for the vp4 internal domain, which is -70 a in diameter, to be incorporated into the rotavirion through the vp7 type ii channels, which are only -55 a in diameter. vp4 is synthesized on the free ribosomes in the cytoplasm (estes, 1990) . examining figure 30 , one can envision newly synthesized vp4 interacting with the vp6 of double-shelled particles prior to interaction with the rough endoplasmic reticulum membrane. once assembled on the double-shelled particles, vp4 may aid in recognition of ns28, a nonstructural protein that has been implicated in mediating passage of double-shelled particles through the rough endoplasmic reticulum (au et al., 1988) . maass and atkinson (1990) reported the interaction of ns28 with vp4 and vp7 in infected cells. more recent evidence indicates that the cytoplasmic c-terminus of ns28 has a binding site for both single-shelled particles and vp4 (au et al., 1993) . structural studies on double-shelled particles complexed with ns28 and vp4 will be of interest in this regard. we conclude from our structural and biochemical studies that the rotavirus hemagglutinin is a dimer of vp4. the dimeric nature of the vp4 hemagglutinin can clearly be seen in our sa11-4f structure. radiolabeling studies fur-;$avirus hemagglutinin structure ther support this conclusion. we also have found that the vp4 structure traverses the outer vp7 shell and interacts with the intermediate shell protein, vp6. the difference map between sai 1-4f and the reassortant r-064 reveals a large globular domain beneath the vp7 shell that is part of the vp4 structure. this result suggests that vp4 assembly occurs prior to vp7 assembly and that vp4 might interact with double-shelled particles before budding through the rough endoplasmic reticulum. future structural studies focusing on double-shelled particles complexed with vp4 should provide better insight into the maturation process of rotavirus. preparation and purlftcatlon of natlve and reassortant vlrus two rotavirus strains were used in these studies: the first, sal 1-4f, is a variant of the simian rotavirus sai 1. sai i-if was originally isolated by pereira et al. (1984) and was extensively characterized by burns et al. (1989) . it has been shown to be more stable than the standard sal 1 and is purified at higher yields . because of these qualities, we have chosen this variant for our structural studies. the second, reassortant virus r-004, was derived from a cross of sal14f and the bovine rotavirus 8223. r-004 contains all sa114f genome segments except segment 4. genome segment 4 and its encoded vp4 are derived from 8223 in r-004 (chen et al., 1989) . in contrast with sa114f, r-004 is very unstable and can be purified only to low yield . in addition, r-004 has been shown to express an unexpected antigenic phenotype . viruses were purified as previously described . in brief, ma104 cell monolayers in 150 cm2 flasks were infected with second passage virus at a multiplicity of infection of 10 pfu per cell. after 1 hr of adsorption at 37oc, the inoculum was removed, and the infected monolayer was maintained in 20 ml of serum-free medium 199 containing 1% aprotinin (sigma; 22 u/ml). the infected monolayer was incubated at 37oc until complete cytopathic effect was observed. cell lysates were frozen and thawed three times before purification by a standard cscl gradient protocol. the triple-shelled particles were collected and dialyzed extensively against tris-buffered saline (8.0 g/l naci, 0.38 g/l kci, 0.1 g/l n&hpo,, 1 .o g/l dextrose, 3.0 g/l tris, 0.1 g/i mgcl*, 0.1 g/l cach ]ph 7.4)) at 4oc. triple-shelled particles were stored at 4oc in the presence of 0.01% nana. vlrus radiolabeled virus was prepared as follows: ma104 cell monolayers in 150 cm* flasks were infected with 20 pfu per cell of sai 14f. flasks were incubated for 1 hr at 37*c to allow virus adsorption to cells, and the inoculum was removed. the infected monolayer then was maintained in 20 ml of eagle's medium lacking methionine for 2.5 hr at 37oc. the monolayer then was put in 20 ml of eagle's medium lacking methionine per 150 cm* flask, to which 20 mci/ml of ["slmethionine (icn, 1029 cilmmole) and 5 mg/ml of actinomycin d were added. the flasks were incubated at 37oc until complete cytopathic effects were observed. virus was purified by standard cscl gradient methods as previously described . purified tripleshelled particles were suspended in tris-buffered saline and stored at 4oc in the presence of 0.01% nana. quantltatlon of relative ratlo of radlolabeled vp7:vps' direct quantitation of vp4 by sds-page is difficult because vp4 migrates closely to vp2. however, the vp4 trypsin cleavage products, vp5' and vp8', do not migrate closely to any rotaviral proteins. virus particles were treated with 5 ug/ml trypsin before sds-page analysis. the radioactivity of vp5' and vp7 was measured by quantitation of counts per minute (cpm) in bands separated by 10% sds-page using a betascope blot analyzer (model 803, betagen, waltham, massachusetts). electron cryomlcroacopy specimen preparation for electron cryomicroscopy is well established (dubochet et al., 1988; prasad et al., 1992) . we have employed a containment system adapted for virus specimen preparation (jeng et al., 1988) . the virus suspension (-5 ul) was placed onto a carboncoated holey grid, blotted, and plunged into a bath of liquid ethane (-1 soy& the frozen-hydrated specimen was transferred under liquid nitrogen to a gatan cryoholder and observed in a jeol 1200 transmission electron microscope at 100 kv and a specimen temperature of -185oc. micrographs with regions of interest were recorded as focal pairs with intended sequential underfocus settings of -1 pm and -2 pm, using low electron dose of -5 e-/k with a nominal magnification of 30.008 x computer image processing micrographs were digitized using a perkin-elmer micro-10 microdensi-tome�r with a step size of 25 pm2 per pixel, which corresponds to 8.33 a in the object. individual particles were boxed into a 128 x 128 pixel region, floated, and masked with a suitable radius. the threedimensional reconstruction procedure is similar to that used in earlier studies (crowther, 1971; fuller, 1987; baker et al., 1990; prasad et al., 1992 . both sa114f and r-004 reconstructions were generated by particles chosen from a single micrograph. the center of each particle (corresponding to the phase origin) was determined using cross-correlation methods. the orientation of each particle was determined by a common lines procedure (crowther, 1971; fuller, 1987) which makes use of the icosahedral symmetry. particles in the image further from focus, which exhibits enhanced contrast, were used to find the orientations of the corresponding particles in the image closer to focus. the images closer to focus were used for the threedimensional reconstruction. only those orientations that gave rise to a phase residual value of less than 50° were considered for the reconstruction. the phase origins and orientations were refined iteratively using intra-and interparticle refinement methods. once a welldistributed set of particles was established (-40 particles), the threedimensional density map was generated using cylindrical expansion. the reconstructions were computed to a resolution of -28 a, as judged by the phase residual values. surface representations of the three-dimensional maps were generated and displayed with iris explorer (silicon graphics; . correctlons for magnlflcatlon dlfferencea and the ctf sincethetwostructureswereobtainedfromtwodifferentmicrographs, it was necessary to correct for the possible differences in magnification and defocus levels prior to computing the difference map. two steps were carried out before computing the difference map between native and reassortant structures. in the first step, particles were scaled in fourier space to account for the possible magnification differences between the two micrographs. the appropriate radial scale factor was computed by calculating the average phase residual for the 80 cross common lines in a pair-wise comparison of the particles, using one particle from the sa114f set as a template. the radial scale factor was varied between 0.85 and 1 .i 5 at an interval of 0.01, and the phase residual for the cross common lines was computed as a function of the radial scale factor. the magnification difference between the micrographs was determined to be -0.8%. in the second step, the amplitudes in the fourier transforms of the particles were corrected for the ctf (erickson and klug, 1971 ) using an amplitude contrast factor, or q value, of 14% of the phase contrast (smith and langmore, 1992) . this factor was previously estimated to be about 7% by toyoshima and unwin (1988) . smith and langmore (1992) have suggested that their estimation of 14% could be due to thicker ice. to embed the virus particles completely, the ice thickness has to be more than 1008 a. hence, we chose the value of 14%. since the reconstructions were carried out to a resolution lower than that corresponding to the first minimum of ctf, no corrections to phases were necessary. to calculate the ctf, the precise defocus value must be known. to calculate the value of defocus we made use of the fresnel fringe (white ring) that is seen gracing the periphery of the particle images. the fresnel fringes arise owing to the interference between the scattered and unscattered electrons. in focus, the particle edge, representing asharp discontinuity in the object, has low contrast, and the fresnel fringe is not perceptible. when defocused, these fringes are clearly seen, and their width is directly proportional to the level of defocus (hall, 1966) . the fresnel fringe manifests itself as a deep valley at a radius corresponding to the outer surface in the radial density plot computed from the three-dimensional density maps (figure 2) . we varied the defocus value between 0.5 urn and 2.5 pm with an initial interval of 0.2 urn to determine a rough estimate of the defocus value, and later an interval of 0.1 urn to refine our estimate. for each defocus value within the range specified above, the amplitudes in the particles within the first minimum were corrected for the ctf, using c. value of 5.4 mm and a 0 value of 14%. the three-dimensional reconstruction was computed using these corrected amplitudes. a radial density plot was then generated, following the procedure described in from the ctf-corrected reconstructions and examined to see if the minimum from the fresnel fringe had flattened. using the disappearance of the minimum in the radial density plots as an indicator, the appropriate defocus value for each reconstruction was determined. this minimum flattens out for sai l-4f when using a defocus value of 2.2 urn and for r-004 when using a defocus value of 1.6 urn. for confirmation, we also calculated the defocus value of the corresponding micrograph of the focal pair of sal l-4f, which is 1 urn further from focus, and found the defocus value to be 3.2 urn. the evaluation of the defocus values using the fresnel fringe is possible only because of the relatively smooth outer surface of the rotavirus particles. rotavirus morphogenesis involves an endoplasmic reticulum transmembrane glycoprotein a subviral particle binding domain on the rotavirus nonstructural glycoprotein ns28. virology three-dimensional structures of maturable and abortive capsids of equine herpesvirus 1 from cryoelectron microscopy identification and partial characterization of a rhesus rotavirus binding glycoprotein on murine enterocytes evidence that the sigma 1 protein of reovirus serotype 3 is a multimer biological and immunological characterization of a simian rotavirus sal 1 variant with an altered genome segment 4 phenotypes of rotavirus reassortants depend upon the recipient genetic background specific interactions between rotavirus outer capsid proteins vp4 and vp7 determine expression of across-reactive, neutralizingvp4-specificepitope determinants of rotavirus stability and density during cscl purification procedures for three-dimensional reconstruction of spherical viruses by fourier synthesis from electron micrographs cryo-electron microscopy of vitrified specimens the large viral rna segment of california serogroup bunyaviruses encodes the large viral protein neuroattenuation of an avirulent bunyavirus variant maps to the l rna segment the fourier transform of an electron micrograph: effects of defocussing and aberrations, and implications for the use of underfocus contrast enhancement structural polypeptides of simian rotavirus sall and the effect of trypsin rotaviruses and their replication rotavirus gene structure and function proteolytic enhancement of rotavirus infectivity molecular mechanisms rotaviruses: a review molecular structure of the cell attachment protein of reovirus correlation of computer-processed electron micrographs with sequence-based predictions the t = 4 envelope of sindbis virus is organized by interactions with a complementary t = 3 capsid introduction to electron microscopy genetic recombination with newcastle disease virus, poliovirus, and influenzavirus synthesis and processing of the bovine enteric coronavirus hemagglutinin protein containment system for the preparation of vitrified-hydrated virus specimens identification of the rotaviral gene that codes for the hemagglutinin and proteaseenhanced plaque formation infectious rotavirus enters cells by direct cell membrane penetration, not by endocytosis rotaviruses expression of rotavirus vp2 produces empty corelike particles crystallization of sendai virus hn protein complexed with monoclonal antibody fab fragments protein sigma 1 is the reovirus cell attachment protein trimerization of the reovirus cell attachment protein (sigma 1) induces conformational changes in sigma 1 necessary for its cell-binding function identification of the simian rotavirus sall genome segment 3 product rotavirus proteins vp7, ns28, and vp4 form oligomeric strucutres sequence of a rotavirus gene 4 associated with unique biological properties structure of the reovirus cell-attachment protein a model for the domain organization of sigma i molecular basis of rotavirus virulence role of gene segment 4 genomic heterogeneity of simian rotavirus sall structure of rotavirus three-dimensional structure of rotavirus localization of vp4 neutralization sites in rotavirus by threedimensional cryo-electron microscopy threedimensional structure of single-shelled bluetongue virus three-dimensional transformation of capsids associated with genome packaging in a bacterial virus genomic segment reassortment in rotavirus and other reoviridae mapping of the influenza virus genome. ill. identification of genes coding for nucleoprotein, membrane protein, and nonstructural protein antibodies to the trypsin cleavage peptide vp8 neutralize rotavirus by inhibiting binding of virions to target cells in culture quantitation of molecular densities by cryo-electron microscopy processing of rotavirus glycoprotein vp7: implications for the retention of the protein in the endoplasmic reticulum biochemical and biophysical characterization of the reovirus cell attachment protein sigma 1: evidence that it is a homotrimer contrasttransferforfrozenhydrated specimens: determination from pairs of defocused images identification of the gene coding for the hemagglutinin of reovirus the structure and function of the hemagglutinin membrane glycoprotein of influenza virus three-dimensional structure of rhesus rotavirus by cryoelectron microscopy and image reconstruction purification and characterization of the reovirus cell attachment protein sigma 1 correspondence should be addressed to b. v. v. p. we thank dr. m. k. estes for her comments and suggestions. we acknowledge support from national institutes of health grants gm41064 (b. v. v. p. and w. c.), all6667 (r. f. r.), and rr02250 (w. c.) and from the w. m. keck foundation (a. l. s., b. v. v. p., and w. c.).received march 22, 1993; revised june 10, 1993. key: cord-318276-so5jooj0 authors: bertholet, christine; van meir, erwin; ten heggeler-bordier, béatrice; wittek, riccardo title: vaccinia virus produces late mrnas by discontinuous synthesis date: 1987-07-17 journal: cell doi: 10.1016/0092-8674(87)90211-x sha: doc_id: 318276 cord_uid: so5jooj0 abstract we describe the unusual structure of a vaccinia virus late mrna. in these molecules, the protein-coding sequences of a major late structural polypeptide are preceded by long leader rnas, which in some cases are thousands of nucleotides long. these sequences map to different regions of the viral genome and in one instance are separated from the late gene by more than 100 kb of dna. moreover, the leader sequences map either upstream or downstream of the late gene, are transcribed from either dna strand, and are fused to the late gene coding sequence via a poly(a) stretch. this demonstrates that vaccinia virus produces late mrnas by tagging the protein-coding sequences onto the 3′ end of other rnas. the 5' flanking sequences of cellular genes transcribed by rna polymerase ii contain conserved sequence elements that are responsible for correct initiation of transcription and regulation of gene expression (breathnach and chambon, 1981) . in the case of dna viruses that utilize the host-cell transcription machinery, the 5' flanking regions of the genes closely resemble those of the host cell. vaccinia virus, a member of the poxvirus family, belongs to the small group of dna viruses that replicate in the cytoplasm of the host cell. gene expression in vaccinia virus is temporally well regulated and occurs in two distinct phases. early genes are transcribed before the complete uncoating of the viral dna, and some encode enzymes that are subsequently used for dna replication. late genes encode most structural polypeptides and are expressed after dna replication. vaccinia virus thus offers a unique opportunity to study eukaryotic transcription and mechanisms of gene regulation that presumably have evolved independently from those of the host cell. all enzymes required for the production of mrnas are carried in the virion (reviewed in moss, 1985) and, although the mrnas are made in the cytoplasm, they show such typical eukaryotic features as methylated caps (wei and moss, 1975) and poly(a) tails (kates and beeson, 1970) . the mrnas produced early in infection have an average size of about 1500 nucleotides (oda and joklik, 1967; cooper et al., 1981; mahr and roberts, 1984) . furthermore, these molecules are not spliced (wittek et al., 1980; cooper et al., 1981) and processing at the 5' end has also been ruled out (venkateaan and moss, 1981) . late transcription is characterized by several unusual features. first, a large fraction of late rna can selfhybridize to form double-stranded structures, indicating extensive symmetrical transcription late in infection (colby and duesberg, 1989; duesberg and colby, 1969; colby et al., 1971; boone et al., 1979) . second, late mrnas are on average about twice as long as early mrnas and very heterogeneous in size (oda and joklik, 1967; cooper et al., 1981; mahr and roberts, 1984) . these properties have been explained by a failure of the virus to terminate transcription specifically late in infection (moss, 1985) . several studies have demonstrated that the sequences involved in gene regulation in vaccinia virus reside in the 5'flanking region of early and late genes (weir and moss, 1984; bertholet et al., 1985; cochran et al., 1985) . recent analyses (bertholet et al., 1986; hanggi et al., 1966) have revealed unexpected features of the putative promoter region of a strongly expressed late gene that encodes a major structural polypeptide of molecular weight 11,000 (11k gene). surprisingly, very short stretches, of about 20 nucleotides, preceding the translation initiation codon are sufficient to regulate late gene expression. all mutations around the putative mrna start site, however, have abolished transcription. in several late genes the putative transcription initiation site has been located within, or very close to, the highly conserved sequence taaatg, which also includes the translation initiation codon (weir and moss, 1984; bertholet et al., 1985; hirt et al., 1986; weinrich and hruby, 1986) . thus late mrnas appear to have extremely short nontranslated leader sequences. in all these studies, however, the 5' ends of the mrnas were mapped by the nuclease sl procedure. in this communication we show that the 5' end of the 11k mrna, as defined by this technique, does not represent the true 5'end. instead, the protein-coding sequences of the late gene are preceded by long, polyadenylated rnas. primer extension of the 11k rna to define the 5'end of the 11k late mrna by an alternative procedure, we performed primer extension experiments. as a control, we included the thymidine kinase (tk) early mrna, which showed identical map positions for the 5 end by both sl analysis and primer extension (bajszar et al., 1983: weir and moss, 1983) . appropriate 5' endlabeled dna fragments ( figure 1a ) were hybridized to early or late rna from infected cells and then either extended with reverse transcriptase or treated with nuclease sl ( figure 1b ). for the tk mrna, primer extension and sl nuclease analysis yielded dna fragments of identical length and of the expected size. a protected fragment of 300 nucleotides was obtained by the nuclease sl procedure with the 11k mrna. this places the putative 5' end b. 4500 figure 1 . primer extension and nuclease sl analysis of the tk early and 11k late mrnas (a) shows the coding sequences of the genes (thick lines); the drrection of transcription is indicated by the arrows. the map positions of the 5' end-labeled fragments used as primers and sl probes are also indicated. for analysis of the tk and 11k mrnas, the primers were hybridized to early and late rna, respectively (8, lanes p2). or to trna as a control (6, lanes pl), and extended with reverse transcriptase. appropriate fragments were also hybridized to early or late rna and then treated with sl nuclease (b, c, lanes s,). resulting dna fragments were analyzed on a 6% polyacrylamide sequencing gel. the sizes (in nucleotides) of dna fragments are indicated at right. to sequence the rna, primer extensions of the 11k late mrna were also performed in the presence of dideoxynucleotides (c. lanes g. a, t, c). the dna sequence complementary to the rna around the translabon initiation codon. as read from the gel, is shown at the right in (c of the late transcript very close to the a residue of the atg translation initiation codon, a map position similar to the one reported previously (bertholet et al., 1985) . primer extension, however, yielded a different and unexpected result. a series of closely spaced bands was observed and these fragments were on average about 20 nucleotides longer than the corresponding nuclease sl-protected fragment ( figure 1b ). in addition, intense bands were seen around the position of 1500 nucleotides. this material was resolved into multiple bands upon agarose gel electrophoresis (not shown), indicating considerable length heterogeneity. to exclude the possibility that the primer extension result was due to nonspecific hybridization of the dna fragment, the primer was hybridized to rna under the same conditions but was extended in the presence of dideoxynucleoside triphosphates ( figure 1c ). an easily readable sequence was obtained up to the position of the slprotected fragment, and this sequence corresponded to that reported previously for the 11k gene (bertholet et al., 1985) . beyond this point, bands were observed in all four lanes but were most intense in the '7" track, indicating that this region is rich in a residues (mrna-like strand). in fact, at least 3-4 a residues immediately upstream of the 11k atg translation initiation codon could be read. this experiment thus clearly demonstrated that the primer hybridized to the expected rna but that the rna contained additional sequences upstream of the 5' end defined by nuclease sl analysis, and that this stretch was heterogeneous in length and possibly also in sequence. to characterize these extra sequences, a cdna library was made from total poly(a)-containing late rna. we wish to emphasize that cdna clones were selected on the basis of two criteria only. first, clones were isolated by colony hybridization using as probe an ecori-hindlll fragment of 135 bp from the very beginning of the 11k coding sequences (note that the ecorl site starts at the g residue of the 11k atg). second, only those cdna clones in which that ecorl site was present were chosen for further analysis. all cdna clones that fulfilled these criteria are shown schematically in figure 2 , where they are compared with the corresponding genomic dna around the 1lk gene. in addition to the ecorl site, hindlll and clal sites characteristic of the 11k gene coding sequence and 3'flanking region were also present in all six cdna clones. furthermore, the four cdna clones with long 3' flanking regions also had the expected bamhl site. thus, whereas the sequences downstream of the ecorl site are characteristic of the 11k gene and its 3' flanking region, this is not true for the sequences upstream of it. in this region the restriction maps of the cdna clones differ from the corresponding region of genomic dna. furthermore, these upstream sequences vary considerably in length between different cdna clones, ranging in size from less than 100 nucleotides to more than 2000 nucleotides (the latter for cdna clone 9). the presence of short as well as long sequences upstream of the 11k atg thus reflects the primer extension result. moreover, the different restriction maps of the three long cdnas (clones 3,8, and 9) suggest that these sequences differ. of cdna-derived probes to rna the extra sequences added onto the 11k coding sequences are either of viral or cellular origin. we reasoned that hybridization of cdna-derived probes exclusrvely to rna from infected cells would argue that the rnas are of viral origin, although such a result would not exclude the possibility that they represent cellular rnas that are induced upon virus infection. rna was isolated from noninfected or infected cells and then hybridized to 32p-labeled probes isolated from the 5' end region, upstream of the 11k coding sequences of cdna clones 3, 8, and 9 (see figure 2 ). as a control, a fragment containing the coding region of the 11k gene and yflanking sequences was also hybridized to various nitrocellulose-bound rnas. weak hybridization to rna from uninfected cells was only observed when the filters were washed at low stringency ( figure 3 ). in contrast, the probes from cdna clones 3 and 8 hybridized strongly to both early and late rna from infected cells. whereas these probes hybridized somewhat more strongly to early than to late rna, the opposite is true for the probe derived from cdna 9. finally, the probe specific for the 11k gene hybridized strongly to late rna, as expected, but also to early rna. this may be due to the presence of long early rnas of unknown significance; these rnas have been found to be transcribed from various parts of the vaccinia virus genome (wittek et al., 1980; cooper et al., 1981) , and they may traverse early or late genes on the same or opposite dna strand. probes to viral dna the hybridization experiment suggested that the sequences upstream of the 11k coding sequences are transcribed from the vaccinia virus genome. to confirm this, and to identify the regions from which they are transcribed, the cdna-derived probes used in the previous experiment were hybridized to cloned dna restriction fragments representing the entire vaccinia virus genome ( figure 4 ). each probe hybridized to only one fragment; in the case of probes p3 and p8, the fragments map more than 100 kb apart on the genome. moreover, with respect to the position of the 11k gene, which is located at the junction of the hindlll f and hindlll e fragments (wittek et al., 1984) and is transcribed from left to right, probe p3 hybridized to sequences located, upstream (see below), whereas probes p8 and p9 clearly hybridized to sequences downstream of it. %p-labeled dna probes (~3, ~9, p9, pllk) derived from the 5' region of the corresponding cdna clones or from genomic dna (see figure 2 ) were hybridized lo rna isolated from uninfected (hela) or infected (early, late) cells, or to trna immobilized on a nitrocellulose membrane. after hybridization, parallel strips were washed at 38% in 50% formamide. 0.1% sds, and the indicated ssc concentration (see experimental procedures). n.d., not determined. size heterogeneity of the 11k rna population the previous experiments demonstrated that various rnas transcribed from different regions of the viral genome serve as leader rnas. the 11k mrna population should thus exhibit considerable length heterogeneity. this was tested by northern blot analysis ( figure 5 ). when a dna fragment from the 11k coding region was used as a hybridization probe (pllk), two long rna species were detected with early rna. these presumably represent rnas transcribed from the dna strand oppoa synthetic oligonucleotide complementary to nucleotides 44 to 65 downstream of the g of the 1lk atg of the noncoding strand was used as a primer for sequencing. plasmid dna was used to sequence clones 1 and 3; sequences for clones 8 and 9 were obtained from fragments cloned into single-stranded phage dna (see figure 2 for clone designations). the genomic sequence (mrna-like strand) around the 11k atg is compared with the corresponding region of the cdna (bottom). boldface, underlined letters indicate common sequences, and the bar represents the dna region to which the 5'end of the 11k mrna was mapped by nuclease sl analysis. site to the 11k gene previously detected by nuclease sl analysis (wittek et al.,1984) . with late rna, a smear was observed over a wide size range. this size heterogeneity is certainly in part due to the well-established length heterogeneity at the 3' end of late rnas (reviewed by moss, 1985) . on the basis of the previous experiments, however, it is clear that the s'end also contributes to the overall size heterogeneity. total early and late rnas were also hybridized to probes p3, p8, and p9 derived from the leader rnas. particularly with probes p8 and p9, smears were seen with late rna, demonstrating that late rnas encoded in these parts of the genome also vary in size. contrary to what one might have expected, probe pllk did not detect smears that comigrated with those detected with the leader-rna-derived probes. this is not surprising since, as shown by cdna cloning, the length heterogeneity contributed by the 5' end of the 11k rna population results from fusion of different leader rnas to the 11k coding region. in addition, a given leader rna as represented by probe p3, p8, or p9 may be fused to various other rna% of which the 11k rna population presumably only represents a small subset. furthermore, some rnas detected by the leader rna probes may themselves be acceptor rnas to which other leader rnas are fused. sequence at the junction of leader rna and 11k coding region to characterize the junction at the 11k and foreign sequences, the relevant portions of the cdnas were se-quenced. the cdna clones 3, 8, and 9 were chosen for this analysis, as was clone 1, which contained only a short stretch of additional dna upstream of the ecorl site (figure 8 ). an identical sequence, corresponding to the 11k gene (bertholet et al., 1985) , was obtained for all four cdna clones up to the position of the atg codon. beyond this point, clone one had a long poly(a) stretch (on the mrna-like strand) of about 40 residues, which led to the typical problems in the sequencing reactions described for long homopolymer stretches (smith, 1980) and which presumably also caused the problems in the sequencing of the rna ( figure 1c ). this region was followed by a poly(g) tract, as expected from the cdna cloning protocol used. the cdna clones 8 and 9 also contained long poly(a) sequences of at least 80 residues. this is less obvious for cdna 3, in which only the first three bases can clearly be identified as a residues. in the three long cdnas, the homopolymer stretches appeared to be followed by other sequences, although the actual sequences could not be read in this part of the gel. appropriate fragments from the 5'end of the cdna clones were therefore subcloned, and in each case stretches of 300 to 400 nucleotides were sequenced in the opposite direction (not shown). this confirmed that the leader rnas contain additional sequences upstream of the poly(a) tract. nucleaee sl protection of a cdna-derived probe the primer extension and cdna cloning experiments described above depended on the use of reverse transcriptase. to rule out the possibility that the unusual structure at the vend of the 11k mrna as revealed by these experiments is due to an artifact generated by reverse transcriptase, we searched for means independent of this enzyme to confirm the observed structures. a nuclease sl protection experiment was performed using as hybridization probe a dna fragment derived from cdna clone 1 ( figure 7a ). this fragment was 5' end-labeled at the hindlll site within the coding sequences of the 11k gene, and it contained a stretch of 30 t residues (complementary strand) immediately upstream of the aaatg genomic sequence (see also figure 8 ) comprising the 11k translation initiation codon. this sequence is preceded by 15 c residues resulting from the cdna cloning procedure, and the fragment was isolated after cleavage at the pstl site just upstream of this sequence. a second probe consisted of a xbal-hindlll fragment derived from genomic dna and was also labeled at the hindlll site. the genomic ( figure 78 , lane 5) and cdna-derived probes (lane 8) were analyzed either directly or after hybridization to early rna and sl treatment (lanes 1 and 2). as expected, no protected fragments were observed with early rna and either probe. note that in contrast to the probe used for hybridization to rna (figure 3) , the sl probe was labeled on one strand only and therefore did not detect the early rna transcribed from the dna strand opposite to the 11k gene (wittek et al., 1984) . total late rna from infected cells protected fragments of 127 and 128 nucleotides of the genomic probe, consistent with the previous experiments, which demonstrated that the sequence complementarity between the genomic dna and 11k mrna ends just upstream of the 11k atg codon. significantly, protected fragments of about 158 nucleotides were observed with the cdna-derived probe, as well as some reannealing of the probe (figure 78, lane 4) . these bands can best be explained by the presence of a poly(a) sequence upstream of the 11k atg codon. the ladder pattern in the lower part of the gel probably results from nuclease sl cleavage at internal positions within the poly(a) sequence. however, very little, if any, material was observed at the position of the protected fragments obtained with the genomic probe, suggesting that the majority of the 11k mrna molecules contain such a poly(a) stretch upstream of the translation initiation codon. to confirm that the 11k mrna molecules also contain sequences upstream of the poly(a) tract, we performed electron microscopy of dna-rna hybrids. a clal dna fragment consisting of about 700 bp of dna downstream of the 11k atg codon and about 600 bp upstream of the translation initiation codon was isolated. this dna was hybridized to 11k rna purified by hybrid selection to dna of the 11k coding region, and the resulting molecules were analyzed by electron microscopy. many molecules with a "y" structure characteristic of the two hybrids shown in figure 8 were examined. four different regions can be distinguished, two of which were constant in size and two of which varied in size. we interpret the different regions as follows: region 1, which is constant in size, represents the single-stranded dna of the 11k 5' flanking sequence. region 2 varies in size from 890 to 6070 nucleotides and represents the leader rna. the double-stranded region 3 results from hybridization of the 11k coding sequence and 3' flanking sequence to the corresponding region of the rna. finally, region 4 is the 3'portion of the rna. the fact that this region again varies considerably in length is not unexpected from the 3' length heterogeneity of late vaccinia virus transcripts (see moss, 1985) . thus the results obtained by electron microscopy are consistent with the unusual structure of the 11k mrna proposed on the basis of the previous experiments. the following experiment was designed to map precisely the genes encoding the leader rnas on the vacciniavirus dna and to determine their direction of transcription. the restriction sites present in cdna clones 3, 8, and 9 were first mapped within the large hindlll fragments to which the corresponding cdna-derived probes had hybridized (see also figure 4 ). figure 9a shows the map positions of these sequences within the hindlll fragments f, 8, and i. these are shown in the correct left-to-right orientation with respect to the conventional orientation of the genome (figure 98) . thus, the sequences in cdna clone 3 map to the left-hand half of the hindlll f fragment, whereas the 11k gene is located at the extreme right-hand end (wittek et al., 1984) . the sequences in cdna clone 8 map at the lefthand end of the 30 kb hindlll b fragment, and those present in clone 9 map toward the right of the hindlll i fragment. furthermore, from the 5'-to-3 polarity of the leader rnas established by restriction analysis of the cdna clones (see also figure 2 ) and as indicated by the poly(a) se-a. epsbl c bb e c cceh i iii, ,i, , i i i figure 9a ), it is obvious that the leader rnas in cdna clones 3 and 9 are transcribed from the leftwardreading strand of the vaccinia virus genome ( figure 9b ). on the other hand, the 11k mrna is transcribed in the opposite direction (wittek et al., 1984) , as is the leader rna of cdna clone 8. we have shown that late in infection vaccinia virus produces chimeric mrnas consisting of the coding sequences for a major structural polypeptide downstream of poly(a) sequences. we have also found cdna clones that contain only the poly(a) stretch, and thus lack the sequences preceding it. it is possible that these molecules result from premature termination of reverse transcription during cdna cloning. it is known, at least for the klenow fragment of dna polymerase, that long homopolymer stretches are not easily copied (smith, 1980) . this interpretation is also consistent with the results obtained by electron microscopy, where we have found no evidence for rna molecules that had only very short sequences upstream of the 11k coding sequences. on the contrary, most molecules observed possessed even longer leader rnas than those found by cdna cloning. both procedures also confirmed the 3' length heterogeneity of late rna. we have sequenced the s'ends of five cdna clones and have found stretches of between 30 and 100 a residues (not shown). these ends, therefore, most likely represent the true 3'ends and did not result from priming at internal positions in the rna molecules. on the other hand, it is clear from the data presented here that the 5' end also contributes significantly to the overall length heterogeneity of late rna. an important question is whether this unusual mechanism of rna synthesis is unique to the particular late gene examined. a preliminary analysis of cdna clones for another late gene that we have recently mapped and sequenced (hirt et al., 1988) has shown that they also contain a poly(a) stretch and additional sequences upstream of the translation initiation codon that are not transcribed from dna in the immediate vicinity of the gene. perhaps the most intriguing aspect of the bizarre structure of the 11k mrna concerns its translation. in most eukaryotic mrnas the aug closest to the 5' end is used for translation initiation (kozak, 1978) , although exceptions to this rule, particularly for mrnas of viral origin (reviewed in kozak, 1988) have been described. if the 11k polypeptide is indeed translated from the mrnas described in this paper, the translation initiation codon would be located in several cases thousands of nucleotides downstream of the 5' end. furthermore, the leader sequences certainly contain several augs. one might therefore expect a poor translation efficiency of the 11k sequences. however, the opposite appears to be true. from the abundance of the polypeptide, the 11k gene is judged to be one of the most strongly expressed vaccinia virus genes, since its product contributes about 10% to the total protein mass present in purified virions (sarov and joklik, 1972; moss, 1974) . furthermore, in vitro translation of hybrid-selected rna, which was used to map the 11k gene (wittek et al., 1984) rna that was first selected by hybridization to the cdna of one of the leader rnas and then on the 11k coding sequences, suggests that such chimeric rnas may indeed be functional 11k mrnas. another interesting question is whether the leader rna sequences are also translated. at present this question cannot be answered, since in the previous mapping of the 11k gene (wittek et al., 1984) we used immunoprecipitation of the in vitro translation products of hybrid-selected rna to identify the 11k polypeptide. any additional polypeptides translated from 11k mrna would therefore have been lost. it should be borne in mind, however, that these unusual mrnas are translated in infected cells, and results obtained in a reticulocyte lysate might not reflect the in vivo situation. mrnas, this could explain the shutoff of early gene expression late in infection, which has been particularly well studied in the case of the viral tk gene (mcauslan, 1963a (mcauslan, , 1963b jungwirth and joklik, 1965; zaslavsky and yakobson, 1975; hruby and ball, 1981 tides of cellular mrnas, including the cap structure, to prime the synthesis of its mrnas (plotch et al., 1981; herz et al., 1981; reviewed in krug, 1985) , which thus consist of cellular and viral sequences. a similar mechanism has also been reported for bunyavirus (patterson et al., 1984) . coronaviruses use a virus-encoded leader rna of about 70 nucleotides to prime the synthesis of the individual mrnas on the template molecule (lai et al., 1984; baric et al., 1985; reviewed in krug, 1985) . a nonviral example is the mrnas of trypanosomes, which contain an identical 35 nucleotide leader sequence (reviewed in borst, 1986; van der ploeg. 1986 ). it is not clear how the leader is added to the body of the mrna, but frans splicing appears to be the most likely mechanism (murphy et al., 1986; sutton and boothroyd, 1986) . we can only speculate on the mechanism by which the long polyadenylated rnas are fused to the late coding sequences in vaccinia virus. splicing of a single, very long precursor molecule can be excluded, since the sequences added to the body of the late mrna are transcribed from either dna strand and are located either upstream or downstream of the 11k gene. a priming mechanism can be envisaged in which the poly(a) sequence of the donor rna interacts with sequences on the dna coding strand. an alternative hypothesis is that the two rnas are joined together by ligation. one would then have to explain how transcription of the late coding sequences is initiated. although it cannot be excluded that the upstream sequences act as promoters, we prefer an alternative model in which the highly conserved 5'-taa-atg-3' sequences represent processing sites at the rna level. it is well established that very long transcripts are made particularly late in infection, and from both dna strands (reviewed in moss, 1985) . these transcripts might represent precursor molecules from which the late coding sequences are excised by a site-specific endoribonuclease. interestingly, an enzyme that appears to possess the required properties has been isolated from vaccinia virions and partially characterized (paoletti and lipinskas, 1978) . clearly, further work is needed to understand the mechanism by which donor and acceptor rnas become joined. however, in comparison with trypanosomes, vaccinia virus represents a relatively simple system that should greatly facilitate the unraveling of this unusual mechanism of rna production. virus and cells the wr strain of vaccinia virus was obtained from bernard moss, national institutes of health, bethesda, md. hela cells were grown as monolayer cultures in dulbecco's modified eagle's minimal essential medium supplemented with 5% fetal calf serum. total rna from infected hela cells was isolated after lysis of the cells with 8 m guanidinium hydrochloride exactly as previously described (wittek et al., 1964) . early rna was isolated at 4 hr after infection from cells maintained in medium containing either 100 pglml of cycloheximide or 40 pglml of cytosine arabinoside. late rna was extracted at 7 hr after infection from cells that were not treated with either inhibitor. sl analysis the 5' ends of rna were mapped by sl analysis (berk and sharp, 1977) using 5' end-labeled dna fragments as hybridization probes (weaver and weissman, 1979) . the experimental details were as previously described (wittek et al., 1984) . cdna cloning and identification of cdna clones poly(a)containing cytoplasmic rna from infected cells isolated late in infection was annealed to oligo(dt)-tailed plasmid p&v-1 vector dna (okayama and berg, 1983 ) purchased from pharmacia. all subsequent steps were performed exactly as described in the detailed protocol supplied by pharmacia, except that jmlo9 host bacteria were used for the initial plasmid amplification. desired cdna clones were identified by colony hybridization using an ecori-hindlll dna fragment located at the beginning of the 11k coding sequences (bertholet et al., 1985) ; the fragment was 32p-labeled by repair synthesis (maniatis et al., 1982) . dna from positive colonies was isolated, and used to transform competent dh5.1 host cells (vector cloning systems, san diego, cal.). slot blot hybridization dna or rna was immobilized on nitrocellulose membranes using a minifold apparatus (schleicher and schuell). the filters were dried at 60% for 2 hr and prehybridized in a solution consisting of 150 uglml of denatured herring sperm dna, 5x ssc (ssc is 0.15 m naci, 0.015 m sodium citrate), 25 mm sodium pyrophosphate, ix denhardt's solution (denhardt, 1966) and 50% formamide, at 40% for 4-12 hr. hybridizations were carried out at 40°c for 16-90 hr with =p-labeled dna probes (rigby et al., 1977) that were denatured and diluted into prehybridization solution. for rna-dna hybridizations(figure3) parallel filters were washed at 38oc in 50% formamide, 0.1% sds, and either 2x ssc, 0.5x ssc, or'0.2x ssc. for dna+dna hybridizations ( figure 4 ) the.filters were washed at 38°c in 60% formamide, 0.1% sds, 0.2x ssc. blot analysis i . for northern blot analysis, 10 ug of total early or late rna was denatured with glyoxal as previously described (mcmaster and carmichael, 1977) and then size-fractionated by electrophoresis in a 1% agarose gel. the rna was then transferred to nitrocellulose membranes (thomas, 1980) and hybridized to dna fragments that had been 32p-labeled by the random-primer procedure (feinberg and vogelstein, 1983) . prehybridization, hybridization, and washing of the membranes in 0.5x ssc were performed as described for slot blot hybridization. dna sequencing sequencing of dna was performed by the chain-termination method (sanger et al., 1977) either on linearized, denatured plasmid dna or after appropriate dna fragments were subcloned on single-stranded recombinant phage wb238 dna ( barnes and bevan, 1983; sahli et al., 1985) . microscopy for electron microscopy, llk-specific rna was selected by hybridization to a single-stranded hindlll-clal fragment containing the 1 ik coding sequences. the dna was covalently bound to sephacryl s-1000 (pharmacia) by the procedure of seed (1982) with modifications (bonemann et al., 1982; bonemann, 1982) . aclal dnafragment spanning the 11k gene (wittek et al., 1984) was purified from an agarose gel. dna-rna hybrids were formed as described by brack et al. (1976) . rna and dna were dissolved at 10 &ml, and 1 ul of each was diluted into 14 pi of 76% formamide, 100 mm pipes-naoh (ph 7.8) 20 mm tris-hci (ph 7.8) 4 mm edta, 500 mm naci. samples were heated to 8ooc for io min and then incubated at 49% for 1 hr. the mixture was then diluted with 10 volumes of 70% formamide, 100 mm tris-hci (ph 8.5) 10 mm edta, 50 &ml of cytochrome c, and spread on distilled water as hypophase. the samples were processed for electron microscopy as previously described (brack et al., 1978) . plasmid pbr322 and phage (px174 dna were included as double-and singlestranded length standards, respectively. immobilization of denatured dna to macroporous supports: i. efficiency of different coupling procedures in vitro mutagenesis of the promoter region for a vaccinia virus gene: evidence for tandem early and late regulatory signals double-stranded rna in vaccinia virus infected cells mechanism of synthesis of vaccinia virus double-stranded ribonucleic acid in vivo and in vitro extension of the transcriptional and translational map of the left end of the vaccinia virus genome to 21 kilobase pairs a member filter technique for the detection of complementary dna on the biosynthesis and structure of double-stranded rna in vaccinia virus-infected cells a technique for radiolabeling dna restriction endonuclease fragments to high specific activity conserved taaat motif in vaccinia virus late promoters: overlapping tata box and site of transcription initiation influenza virus, an rna virus, synthesizes its messenger rna in the nucleus of infected cells localization and fine structure of a vaccinia virus gene encoding an envelope antigen control of expression of the vaccinia virus thymidine kinase gene studies on "early" enzymes in hela cells infected with vaccinia virus ribonucleic acid synthesis in vaccinia virus. ii. synthesis of polyriboadenylic acid how do eucaryotic ribosomes select initiation regions in messenger rna? bifunctional messenger rnas in eukaryotes the role of rna priming rn viral and trypanosomal mrna synthesis characterization of leader rna sequences on the virion and mrnas of mouse hepatitis virus, a cytoplasmic rna virus arrangement of late mrnas transcribed from a 7.1-kilobase ecorl vaccinia virus dna fragment molecular cloning: a laboratory manual control of induced thymidine kinase activity rn the poxvirus infected cell the induction and repression of thymidine kinase in the poxvirus-infected hela cell analysis of single-and double-stranded nucleic acids on polyacrylamide and agarose gels by using glyoxal and acridine orange reproduction of poxviruses replication of poxviruses identification of a novel y branch structure as an intermediate in trypanosome mrna processing: evidence for pans splicing hybridization and sedimentation studies on "early" and "late" vaccinia messenger rna a cdna clomng vector that permrts expression of cdna inserts in mammalian cells soluble endoribonuclease activity from vaccinia virus: specific cleavage of virion-assocrated highmolecular weight rna la crosse virions contain a primer-stimulated rna polymerase and a methylated cap-dependent endonuclease a unique cap (m7gpppxm)-dependent influenza vrnon endonuclease cleaves capped rnas to generate the primers that rmtiate vrral rna transcnption labeling deoxyribonucleic acid to high specific activity in vitro by nick translation with dna polymerase i dnasequence comparison between two tissue-specific variants of the autonomous parvovirus, minute virus of mice dna sequencing with chain-terminating inhibitors studies on the nature and location of the capsid polypeptides of vaccinia virions diazotizable arylamine cellulose papers for the coupling and hybridization of nucleic acids dna sequence analysis by primed synthesis evidence for trans splicing in trypanosomes hybridization of denatured rna and small dna fragments transferred to nitrocellulose discontinuous transcription and splicing in trypanosomes in vitro transcription of the inverted terminal repetition of the vaccinia virus genome: correspondence of initiation and cap sites mapping of rna by a modification of the berk-sharp procedure: the 5' termini of 6-globin mrna have identical map coordinates methylated nucleotides block s-terminus of vaccinia virus mrna a tandemly-oriented late gene cluster within the vaccinia virus genome nucleotide sequence of the vaccinia virus thymidine kinase gene and the nature of spontaneous frameshift mutations regulation of expression and nucleotide sequence of a late vaccinia virus gene expression of the vaccinia virus genome: analysis and mapping of mrnas encoded within the inverted terminal repetition mapping of agene coding for a mafor late structural polypeptide on the vaccinia virus genome control of thymidine kinase synthesis in ihd vaccinia virus-infected thymidine kinase-deficient lm cells we are grateful to anne seilertuyns and philippe walker for the bertholet, c., drillien, r., and wittek, r. (1985) . one hundred base pairs of 5'flanking sequence of a vaccinia virus late gene are sufficient to temporally regulate late transcription.proc. natl. acad. sci. usa 82, 2096 -2100 . bertholet, c., stocco, t?, van meir, e., and wittek, r. (1986 . functional analysis of the 5' flanking sequence of a vaccinia virus late gene. emb~ j. 5, 1951 emb~ j. 5, -1957 boone, ft. f, parr, r. p, and moss, b. (1979) . intermolecular duplexes formed from polyadenylated vaccinia virus rna. j. virol. 30, 365-374. borst, l? (1986) . discontinuous transcription and antigenic variation in trypanosomes.ann. rev. biochem. 55, 701-732. brack, c., hirama, m., lenhard-schuller, r., and tonegawa, s. (1978 key: cord-279463-bli8hwda authors: lipp, joachim; dobberstein, bernhard title: the membrane-spanning segment of invariant chain (iγ) contains a potentially cleavable signal sequence date: 1986-09-26 journal: cell doi: 10.1016/0092-8674(86)90710-5 sha: doc_id: 279463 cord_uid: bli8hwda abstract the human invariant chain (iγ) of class ii histocompatibility antigens spans the membrane of the endoplasmic reticulum once. it exposes a small amino-terminal domain on the cytoplasmic side and a carboxyterminal, glycosylated domain on the exoplasmic side of the membrane. when the exoplasmic domain of iγ is replaced by the cytoplasmic protein chloramphenicol acetyltransferase (cat), cat becomes the exoplasmic, glycosylated domain of the resulting membrane protein iγcat∗. deletion of the hydrophilic cytoplasmic domain from iγcat gives rise to a secreted protein from which an amino-terminal segment is cleaved, most likely by signal peptidase. we conclude that the membrane-spanning region of iγ contains a signal sequence in its amino-terminal half and that hydrophilic residues at the amino-terminal end of a signal sequence can determine cleavage by signal peptidase. the human invariant chain (ly) of class ii histocompatibility antigens spans the membrane of the endoplasmic reticulum once. it exposes a small amino-terminal domain on the cytoplasmic side and a carboxyterminal, glycosylated domain on the exoplasmic side of the membrane. when the exoplasmic domain of ly is replaced by the cytoplasmic protein chloramphenicol acetyltransferase (cat), cat becomes the exoplasmic, glycosylated domain of the resulting membrane protein i$at*. deletion of the hydrophilic cytoplasmic domain from l$xt gives rise to a secreted protein from which an amino-terminal segment is cleaved, most likely by signal peptidase. we conclude that the membranespanning region of ly contains a signal sequence in its amino-terminal half and that hydrophilic residues at the amino-terminal end of a signal sequence can determine cleavage by signal peptldase. translocation of proteins across the membrane of the endoplasmic reticulum (er) requires signal sequences and specific receptors that recognize them (see recent reviews by hortsch and meyer, 1984; walter et al., 1984; rapoport and wiedmann, 1985; wickner and lodish, 1985) . signal sequences have been found at the amino-terminal end of precursors for secretory and transmembrane proteins. in many cases they are cleaved during their translocation across the membrane by a specific protease (signal peptidase). signal sequences are quite variable in length, ranging from 16 to more than 50 amino acid residues (von heijne, 1983) . they all have a central core of hydrophobic amino acid residues, and most of them have a positively charged amino-terminal segment (von heijne, 1985) . signal sequences on nascent polypeptides are recognized by the signal recognition particle (srp), a ribonucleoprotein complex that mediates the interaction with the membrane by the selective binding to docking protein (or srp receptor) (walter et al., 1981b; meyer et al., 1982; gilmore et al., 1982) . membrane proteins are also inserted into the er membrane by an srp-mediated mechanism (anderson et al., 1983; rottier et al., 1985; spiess and lodish, 1986; lipp and dobberstein, 1986) . those spanning the membrane once have either the carboxyl terminus (type i membrane proteins) or the amino terminus (type ii membrane proteins) exposed on the cytoplasmic side. membrane insertion of type i membrane proteins most likely proceeds in a manner very similar to that of secretory proteins (lingappa et al., 1978) . type i membrane proteins are usually synthesized with a cleavable signal sequence and, in contrast to secretory proteins, are held in the membrane by a "stop transfer" sequence. examples of type i membrane proteins are the vesicular stomatitis virus g protein and class i and class ii histocompatibility antigens (lingappa et al., 1978; dobberstein et al., 1979) . of the type ii membrane proteins so far investigated, all are synthesized without a cleavable signal sequence. the neuraminidase of influenzavirus (60s et al., 1984) , the invariant chain (ii or ly) of class ii histocompatibility antigens (claesson et al., 1983; strubin et al., 1984; long, 1985; lipp and dobberstein, 1986) , the transferrin receptor (schneider et al., 1984) , and the asialoglycoprotein receptor (chiacchia and drickamer, 1984; holland et al., 1984; spiess and lodish, 1986 ) all belong to this class of membrane proteins. some steps in their membrane insertion must be similar to that of secretory and type i membrane proteins, as an srp-and docking protein-dependent membrane insertion has been demonstrated for some of them (spiess and lodish, 1986; lipp and dobberstein, 1986) . membrane insertion might occur in a loop-like fashion as this scheme can most easily explain how the different membrane topologies of membrane proteins are achieved (engelman and steitz, 1981) . as type ii membrane proteins contain only a single stretch of hydrophobic amino acid residues, this might function as a signal for membrane insertion as well as a membrane anchor (markoff et al., 1984; spiess and lodish, 1986) . to identify and characterize this sequence, we tested membrane insertion of the human invariant chain (ly) and several deletion and fusion proteins derived from it in a cell-free membrane insertion system. ly is a typical type ii membrane protein (claesson et al., 1983; strubin et al., 1984; lipp and dobberstein, 1986) . it exposes 30 amino-terminal residues on the cytoplasmic side, spans the membrane between residues 30 and 60, and exposes a large carboxy-terminal domain on the exoplasmic side. this domain has two sites for the addition of n-linked carbohydrate units. membrane insertion of ly requires srp and docking protein (lipp and dobberstein, 1986) . as the amino-terminal, cytoplasmic domain is hydrophilic and shows no resemblance to a signal sequence, it has been proposed that the membrane-spanning region, or part of it, functions as an internal, uncleavable signal sequence (dobberstein et al., 1983; claesson et al., 1983; lipp and dobberstein, 1986) . we demonstrate here that the membrane-spanning region of ly is composed of a potentially cleavable signal sequence fused to part of a membrane anchor, which together with the cytoplasmic domain determine the orientation of ly in the er membrane. deletion of the cytoplasmic domain exposes the signal sequence at the amino terminus of the membrane-spanning region, resulting in cleavage of this otherwise uncleaved signal. claesson et al., 1993) . ply. the complete ly coding and all of its 3' noncoding sequence was cloned behind the t5 promoter (p) in the pds5 expression vector. plycat, the portion downstream of the pstl site in ply was replaced by the chloramphenicol acetyltransferase (cat) gene resulting in an in-frame fusion protein. pan-lycat, the segment between the sau3a and sstll sites of plycat coding for the cytoplasmic domain was deleted. a new atg initiation codon right in front of the membrane-spanning segment is provided by the vector (see figure 4a ). the regions coding for protein are boxed. the membrane-spanning region of ly is indicated by loops; the hydrophilic domains by dots. cat-derived sequences are indicated by slanted lines. the position of n-linked glycosylation sites in ly and the potential n-linked glycosylation site in cat protein are indicated by an asterisk. relevant cleavage sites for restriction endonucleases are also indicated. protein segments that perform a particular function can be identified by their deletion or addition to unrelated proteins. we used this approach to localize and characterize the region in ly that is responsible for membrane insertion. deletions and fusions were made at the dna level after cloning of ly cdna into an expression vector. messenger rna was transcribed from these plasmids and translated in a cell-free system. the resulting proteins were tested for their ability to insert into microsomal membranes (blobel and dobberstein, 1975; stueber et al., 1984) . plasmids ply, plycat, and pan-lycat we have shown previously that cdna sequences cloned behind the strong t5 promoter in pds5 can be transcribed very efficiently by e. coli rna polymerase (stueber et al., 1985) . when transcription is performed in the presence of the cap analog 7mgpppa, the resulting mrna can be translated efficiently in eukaryotic cell-free systems. we have observed, however, that a stretch of gc residues at the 5' end of a cdna negatively affects expression of the resulting rna (unpublished observation). the ly cdna construct (py-2) had been gc tailed and was inserted into the pstl site of pbr322 (claesson et al., 1983) . we deleted the 5' gc tail and cloned ly cdna, or part of it, into the polylinker site of pds5 or p6/5r (see experimental procedures for details). ply contains the entire l-y coding region behind the t5 promoter ( figure 1 ). plycat is an in-frame fusion between the 5' region of ly encoding the cytoplas, mic, membrane-spanning segment plus 12 amino acids of the exoplasmic portion of ly and the gene encoding the cytoplasmic protein chloramphenicol acetyltransferase (cat). the cat protein contains one potential site for the addition of n-linked oligosaccharide 36 amino acid residues downstream of its original initiator methionine. in an-iycat, the entire hydrophilic, cytoplasmic segment from ly was deleted. the new initiator methionine is provided by the vector and is located in front of the hydrophobic segment. in vitro translation and membrane insertion of ly when ply was transcribed by e. coli rna polymerase and the resulting mrna translated in the wheat germ cell-free system, a single polypeptide species of 27 kd was obtained ( figure 2 , lane 1). this is the expected molecular weight for nonglycosylated ly (claesson et al., 1983) . when rough microsomes (rm), derived from dog pancreas, were added to the translation system, a higher molecular weight species of 33 kd appeared. this increase of 6 kd in molecular weight is consistent with the addition of two oligosaccharides to the two n-glycosylation sites. the 33 kd form ly* was reduced in molecular weight by about 2 kd when proteinase k was used to remove the cytoplasmically exposed domain (figure 2 , lanes 2 and 3). when protease digestion was performed in the presence of the detergent np 40, ly' was digested. these data suggest that ly' is integrated into the membrane and exposes 20-30 amino acid residues on the cytoplasmic side and a 30 kd domain on the exoplasmic side of the membrane. the identity of ly and its glycosylated form was confirmed by immunoprecipitation with antibodies raised against the amino-terminal 72 (anti-iyn) or against the carboxy-terminal 144 (anti-iyc) residues of ly. as shown in figure 2 , lanes 5, 6, 8, and 9, these antibodies recognize glycosylated and nonglycosylated forms of ly. no protein could be precipitated with anti-iyn antibody when the cytoplasmic domain was removed from membrane-integrated ly' by protease digestion (figure 2, lane 7) . as the antibody is directed against the amino-terminal portion of ly, the data directly demonstrate that the amino terminus is located on the cytoplasmic side and is accessible to the protease. with anti-iyc antibody, the processed form of ly is readily detectable, demonstrating an exoplasmic location of the carboxy-terminal portion of ly ( figure 2 , lane 10). membrane insertion of iycat an analysis of membrane insertion was performed for ly-cat and cat as described above for ly. cat was expressed from pds5. iycat was synthesized in the absence of microsomal membranes as a 34 kd protein ( figure 3 , lane 1) and in the presence of microsomal membranes as a 37 kd protein called lycat* ( figure 3 , lane 2). 12 3 4 in vitro translation and membrane insertion of ly ply was transcribed in the presence of the cap analog 7mgpppa by e. coli rna polymerase. the resulting mrna was translated in the wheat germ cell-free system in the absence (lanes 1,5. and 8) or presence (lanes 2, 3, 4, 6, 7, 9. and 10) of rm. the membrane topology of ly was determined by treatment with proteinase k (pk) (lanes 3, 7, and 10) or pk and the detergent np40 (lane 4). proteins were separated by sds-page and visualized by autoradiography. lanes 1-4 show total protein synthesized. samples characterized in lanes 5-7 were immunoprecipitated with an antibody raised against the amino-terminal 72 amino acid residues of ly (anti-iyn); in lanes 8-10, with an antibody against the carboxy-terminal portion of ly (anti-iyc). the increase in molecular weight is consistent with the addition of one n-linked oligosaccharide to the cat-derived portion. there is one potential site for n-linked glycosylation in the cat protein. after protease digestion in the presence of microsomal membranes, lycat* is reduced in molecular weight by about 2 kd, suggesting that it exposes 20-30 amino acid residues on the cytoplasmic side ( figure 3 , lanes 2 and 3). cat protein obtained after transcription-translation from pds5 is not modified by the added microsomes. as expected, no shift in molecular weight can be seen (figure 3 , lanes 5 and 6). cat protein was very resistant to protease digestion even in the presence of np40 (figure 3, lanes 7 and 8). i-&at, in contrast, was very sensitive to added protease. this might reflect a difference in conformation between the free cat protein and the cat-derived portion in iycat. the location of cat outside of the membrane vesicles can be demonstrated by sedimenting the membranes by centrifugation. cat protein is then found in the supernatant (data not shown). we conclude from the data obtained with ircat and cat that the signal for membrane insertion must be located within the first 72 amino acid residues of ly. to localize this signal more precisely, we deleted the first 30 residues of iycat. catinsertion of k&at and cat protein rna derived from plycat or pds5 was translated in the wheat germ cell-free system in the absence or the presence of rm. membrane insertion was tested by treatment with proteinase k (pk) and np40. addition of rm, pk, and np40 is indicated at the bottom of each lane. of an+cat in all secretory proteins the cleavable signal for membrane translocation is located at the amino-terminal end of the precursor polypeptide. the main feature of this signal appears to be its hydrophobicity. in ly the only hydrophobic stretch of amino acid residues that resembles a signal sequence is located in the membrane spanning region about 30 amino acid residues away from the amino-terminal initiator methionine. we asked whether removal of the 30 amino-terminal residues in iycat would affect its membrane insertion and topology. the cytoplasmic domain of iycat was deleted and the initiator methionine was placed in front of the membranespanning segment. the amino-terminal sequences of ly-cat and an-i$at as deduced from the dna sequences are shown in figure 4a . when rna derived from panlycat was translated in the wheat germ cell-free system, a single polypeptide of 29 kd was synthesized, an-iycat ( figure 46 , lane 1). this was, as expected, about 3 kd smaller than the iycat protein ( figure 46 , lane 1). in the presence of microsomes, two new protein bands appeared, one about 1 kd smaller and one 2 kd larger than an-ircat both of these forms were resistant to proteinase k, indicating that they were inserted into or translocated across microsomal membranes (figure 48 , lanes 3 and 4). we suspected that the smaller molecular weight form was generated by signal peptidase cleavage without concomitant glycosylation and that the larger molecular weight form was glycosylated and cleaved by signal peptidase. these possibilities were tested. from pan-lycat was translated in the wheat germ cell-free system in the absence or presence of rm. membrane insertion and topology was tested by treatment with proteinase k (pk) and np40. components were added as indicated below the lanes. iycat translated in the wheat germ cell-free system is shown for comparison. processed and glycoeylated to detect the signal peptide cleavage of a glycosylated protein on a polyacrylamide gel it is necessary to block its glycosylation, but still allow membrane insertion to occur. addition of n-linked oligosaccharides onto nascent polypeptides can be blocked by including synthetic acceptor peptides in an in vitro membrane insertion assay (bause, 1983; lau et al., 1983) . iycat and an-iycat were translated in the presence of microsomes with and without the acceptor peptide asn-leu-thr. the size of iycat synthesized in the presence of rm and acceptor peptide was indistinguishable from that made in the absence of rm. when proteinase k was used to digest its cytoplasmically exposed domain, the size was reduced by about 2-3 kd ( figure 5a ). we can conclude that nonglycosylated iycat synthesized in the presence of rm and acceptor peptide is inserted into the membrane in the same way as its glycosylated form and that no signal sequence is cleaved during membrane translocation ( figure 5a figure 58, lanes 2 and 3) . an-iycat'was also found to be protected against exogenous proteinase k ( figure 56, lane 4) . this suggested to us that the larger form was glycosylated and proteolytically processed and that an-iycat' was generated by a proteolytic cleavage, most likely by signal peptidase. to determine the site of cleavage in the proteolytically processed forms of an-ircat, the positions of leucine in the amino-terminal regions of an-itcat and membraneinserted an-lrcat*' were determined. an-i$at was translated in the absence or presence of rm with [sh]leutine as label. as an-iycat is essentially the only protein synthesized from pan-lycat-derived mrna, the complete translation mixture was subjected to automated edman degradation. as seen in figure 6a , leucine residues are found at the positions 3, 10, 13, 14, and 15, as predicted from the sequence deduced from py-2 cdna (claesson et al., 1983) . the initiator methionine is probably removed during or shortly after translation (kozak, 1983) . the positions of leucine residues in the membranetranslocated forms of an-i$at were similarly determined. as rm in the in vitro assay do not translocate all chains, some cytoplasmic forms remained (see inserts in figures 6a and 6b ). leucine residues were found at positions 1,2,3, and 13 ( figure 66) . larger peaks at positions 3 and 10 are consistent with the presence of some unprocessed an-iycat (see insert in figure 6b ). taking into account the size reduction of about 1 kd by the processing 13, 14, and 15 in authentic an-i$at, we conclude that processing has occurred between amino acid residues 12 and 13 ( figures 6b and 6c ). proteolytically processed an-lycat is translocated into the lumen of microsomal vesicles with the proteolytic removal of 12 of the 30 hydrophobic amino acid residues in the membrane-spanning region of an-iycat, the question arose as to whether the processed protein was still anchored in the membrane or whether it was now released into the lumen of the microsomal vesicles as is the case for secretory proteins. we used the extractability with carbonate as a criterion for membrane integration. treatment of rm with carbonate at ph 11 releases proteins that are not integrated into the lipid bilayer as well as proteins present in the lumen of microsomal vesicles. an+cat was translated in the presence of rm. membranes were isolated by centrifugation through a sucrose cushion and resuspended in carbonate buffer. solubilized components were then separated from membranes by centrifugation. proteins in the membrane pellet and supernatant were analyzed by sds-page and autoradiography. membrane-spanning proteins, ly and iycat, and the secretory protein, mouse granulocyte-macrophage colony stimulating factor (gm-csf), were used as control (gough et al., 1985) . as is shown in figure 7 , ly" and i-&at*, as expected for membrane-spanning proteins, were found in the membrane fraction. both an-i$at' and the gm-csf' were found essentially in the soluble, carbonatereleased fraction. thus an-iycat' is released after the proteolytic processing into the lumen of the microsomal vesicles. proteolytic processing, as described above for an-iycat, was also obtained for an+, a protein that lacks the amino-terminal 30 residues of ly (data not shown). our results show that the membrane-spanning segment of the type ii membrane protein ly contains a potentially cleavable signal sequence. this signal sequence is located in the amino-terminal half of the membrane-spanning segment, and it is cleaved when the preceding cytoplasmic domain is removed. all properties known to identify a signal sequence and a cleavage by signal peptidase can be demonstrated. to restrict vertical mobility of the membranespanning segment. (6) an-itcat, during its initial stage of membrane insertion, also spans the membrane with its hydrophobic segment. however, as no charged amino acid residues are present at the extreme amino-terminal end, the hydrophobic segment has some freedom to change its topology across the membrane. part of the hydrophobic segment might now be pulled into the lumen of the er membrane, and a former cryptic site for signal peptidase cleavage might become accessible to the active center of signal peptidase. first, the cleavage occurs concomitant with insertion into the er membrane as is typical for cleavable signal sequences of presecretory proteins (blobel and dobberstein, 1975) . second, the cleaved segment is located at the aminoterminal end of the deletion protein an-iycat. it is 13 amino acid residues long and composed entirely of hydrophobic or uncharged residues. signal sequences can vary in length from about 15 to over 60 residues. the only structural element identified so far for a signal sequence is its hydrophobic core, usually 8-12 residues long. it is followed by a more polar region 5-7 residues long, which is thought to define the cleavage site for signal peptidase. thus, a "minimal" signal sequence would be composed of an 8 residue hydrophobic core followed by a 5 residue region conferring cleavage specificity (von heijne, 1983 (von heijne, , 1985 . the segment cleaved from protein an-iycat would be consistent with such a minimal length signal sequence. finally, the amino acid residues around the cleavage site in membrane-translocated an-lycat*' are consistent with cleavage by signal peptidase. based on a sequence comparison of 78 eukaryotic signal sequences, von heijne found that only small neutral residues are found at the site of cleavage (-1 position) and that only small neutral and uncharged ones are found at the -3 position, that is 3 amino acid residues in front of the signal peptidase cleavage site (von heijne, 1983) . in the segment cleaved from an-i$at, threonine, a small neutral amino acid, is found at the -1 position, and leucine, an uncharged amino acid, at the -3 position. both of these residues fulfill the above described criteria for a signal peptidase cleavage site. thus, place (rm) and time of cleavage (cotranslational), hydrophobic character of the cleaved segment, and property of the cleavage site demonstrate that an-i$at contains a signal sequence at its amino terminus which is cleaved upon membrane insertion by signal peptidase. how can we possibly explain how the deletion of the cytoplasmic, hydrophilic segment from iycat reveals a cleavable signal sequence in a formerly membranespanning region? to us the most plausible explanation is that the position of the hydrophobic segment in the membrane is different in iycat and an-i$at signal peptidase is known to be an integral membrane protein not exposed on the cytoplasmic side of rm (jackson and blobel, 1977; lively and walsh, 1983; evans et al., 1986) . as in many secretory proteins, the cleavage site for signal peptidase is surrounded on either side by 1 or even 2 charged amino acid residues. it is reasonable to assume that the active center of this enzyme is located close to the exoplasmic side of the er membrane, not within the membrane. we propose that the removal of the cytoplasmic, hydrophilic segment from iycat allows the hydrophobic segment to shift its position within the er membrane. most likely it positions itself more toward the exoplasmic side. hence, a potential signal peptidase cleavage site becomes accessible to the active center of signal peptidase (see figure 8b ). it has been noted previously in type i membrane proteins that a deletion of the charged amino acid residues flanking the membrane-spanning region does not affect the overall topology (zuninga and hood, 1986; cutler et al., 1986) . in the case of es glycoprotein of semliki forest virus, it has been shown that mutation of the basic amino acid residues at the cytoplasmic side of the membranespanning segment reduces the stability of the mutant protein in the membrane (cutler et al., 1986) . when the membrane-spanning regions of type i and type ii membrane proteins are compared, no obvious structural difference can be found. in both types of membrane proteins these regions comprise a stretch of 20 to 30 hydrophobic amino acid residues that is flanked on the cytoplasmic side by positively charged amino acid residues. in type i membrane proteins the segment spanning the membrane does not appear to participate in the initial stage of membrane insertion. type i membrane proteins usually have cleavable signal sequences that initiate the membrane translocation of the amino-terminal half of the protein. the membrane-spanning region, in its position close to the carboxy-terminal end, seems only to function in anchoring the protein in the membrane. yost et al. placed the membrane-spanning segment of the murine surface immunoglobulin heavy chain close to the amino-terminal end of a fusion protein (yost et al., 1983) . in this position the segment did not provide the signal function for membrane insertion. as, however, a hydrophilic segment of about 40 amino acid residues precedes the membrane-spanning segment, the question still remains as to whether a membrane-spanning region from a type i membrane protein, when placed into the appropriate surrounding, can also initiate translocation across the er membrane. it is well conceivable that certain hydrophilic sequences preceding a hydrophobic segment play a crucial role in exposing a potential signal for membrane insertion. up to now no special structural features, besides hydrophobicity, are known to be crucial for the function of a signal sequence. a common step has been proposed for the early stage of membrane insertion of secretory and membrane proteins (dobberstein et al., 1983; spiess and lodish, 1986; lipp and dobberstein, 1986) . this was based largely on the finding that both of these types of proteins require srp and docking protein for their membrane insertion. here, we show that a type ii membrane protein can be converted into a secretory protein by removal of the cytoplasmic segment. this directly demonstrates that the signal for membrane insertion of these two types of proteins can be the same. further deletion into the carboxy-terminal half of the ly hydrophobic segment is required to elucidate whether the cleaved signal sequence contains all the information for membrane insertion. it is conceivable that the functional signal sequence extends over the cleaved signal sequence into the adjacent hydrophobic part. for some secretory protein it has been observed that the cleavable signal sequence is not sufficient for membrane insertion. in the case of staphylococcal protein a, sequences of the amino-terminal part of the mature protein are required for membrane insertion and correct processing (abrahmsen et al., 1985) . srp can arrest elongation of presecretory and type ii membrane proteins after 70 or even more amino acid residues have been polymerized (walter and blobel, 1981a; meyer et al., 1982; lipp and dobberstein, 1986; lipp et al., unpublished data) . these domains are then inserted into the er membrane by a yet unknown mechanism. as the amino terminus of a type ii membrane protein has to remain on the cytoplasmic side, the formation of a loop during membrane insertion has been proposed. in the case of a secretory protein, signal peptidase would be able to act as soon as the loop appears on the exoplasmic side. an initial interaction of basic residues in a signal sequence with the phosphates of the membrane lipids was originally proposed by lnouye for the lipoprotein of e. coli (inouye et al., 1977) . our results rule out an essential role of these basic residues in er membrane insertion. the an-iycat protein does not contain any charged amino acid residues preceding the hydrophobic segment. it is nevertheless translocated across the er membrane and processed. the rules that define the cleavage site for signal peptidase in presecretory proteins are not yet fully understood. von heijne points out that the type of amino acids at the -1 and -3 position in front of the site of cleavage are im-portant in assigning a cleavage site. here we show that sequences at the very beginning of a signal sequence can also influence cleavage by signal peptidase. in the case of ly, these charged residues can prevent cleavage by signal peptidase. the variability in the length and in the amount of charged amino acid residues at the amino terminus of asignal sequence has not as yet been explained. mutation and deletion experiments have clearly shown that charged residues are not essential for membrane insertion. in the light of our findings, we propose that the charged amino acids at the amino terminus of signal sequences function in the alignment of signal sequences in the er membrane such that signal peptidase can cleave at a very specific site with high fidelity. our prediction is that removal of charged residues from the amino-terminal end of the signal sequences can lead to an altered or less specific signal peptidase cleavage. wheat germ was obtained from general mills, california. the acceptor peptide benzoyl-asn-leuthr-n-methylamide was a generous gift from e. bause, cologne. standard molecular cloning techniques, as described by maniatis et al. (1962) were used. the cdna clone py-2, containing the entire coding region of the human invariant chain cloned into the pstl site of pbr322, was obtained from p. a. peterson's laboratory, uppsala, sweden (claesson et al., 1963) . the expression plasmids pds5, pds6, and pds5/3 have been described previously (stueber et al., 1964) . they allow efficient transcription by e. coli rna polymerase of cdnas cloned behind the strong t5 promoter p25. figures 9a and 9b summarize the construction of the fusion and deletion plasmids described below. plycat py-2 was digested with pstl, and the 317 bp fragment containing the 5' end and the 660 bp fragment containing the 3' end of the ly coding region were isolated. the 317 bp fragment coding for the ly cytoplasmic domain, the membrane-spanning segment and 12 amino acid residues of the exoplasmic domain, was cleaved by sauda to remove the sgc tail. the 234 bp sau3a-pstl fragment was isolated and cloned into bamhiipstl-cut pds5. this results in an in-frame fusion of the 5' end of ly to the cat gene. ph initial attempts to clone the completely coding region into pds5 failed. when expressed, this region is probably lethal to the bacterium. to repress transcription from the t5 promoter/operator (p/o) in bacteria, we cloned the lac i repressor between the b/a gene and the t5 p/o. this plasmid is called pfllycat. for the construction of ply, prlycat was linearized by pstl and the 660 bp pstl fragment, coding for the carboxy-terminal domain of it, was ligated into this site. transformants containing the 660 bp fragment were screened for expression of immunoprecipitable ly chain after in vitro transcription-translation. to delete the cytoplasmic domain from ircat, the 950 bp sstll-xbal fragment from plycat was isolated and ligated at the xbal site of bamhllxbal cut p6/5r. the protruding ends at the bamhl and the sstll sites were blunted with sl nuclease and ligated. as a result, a new atg initiation codon is placed just in front of the membrane-spanning segment of ly. the construction was confirmed by dna and amino acid sequence analyses (see figure 4a ). p6lsr to repress transcription from the t5 promoter the lac i gene was inserted between the b/a gene and the t5 p/o region of pds5/3 (stueber et al., 1984) . against ly domains to raise antibodies against the amino-and the carboxy-terminal domains of ly, fusion proteins of b-galactosidase and parts of ly were produced in bacteria and used as antigens to raise antibodies in rabbits. from a pstl digest of w-2, the 317 bp fragment coding for the aminoterminal 72 amino acids of ly and the 860 bp fragment coding for the exoplasmic carboxy-terminal domain of ly were isolated. each of the fragments was inserted into the pstl site of the bacterial expression vector pex1 (stanley and luzio, 1984) . fusion proteins expressed in nfl bacteria were separated on preparative sds-polyacrylamide gels (7% acrylamide; laemmli, 1970) . protein bands were visualized by koac precipitation, and fusion proteins were eluted from gel slices. two rabbits were immunized with each of the two fusion proteins. antibodies against the amino terminus of ly (anti-iyn) and its carboxyl terminus (anti-iyc) were obtained. they reacted with authentic ly chains synthesized by human raji cells (data not shown). lmmunoprecipitations after translation and posttranslational assays, antigens in a 25 vi aliquot were solubilized by adding nonidet-p40 (np40) to 0.5%. then 1 ~1 of either anti-iyn or anti+c antiserum was added and the mixtures incubated for 15 min at 4oc. forty microliters of a 1:l slurry of protein a-sepharose (equilibrated in 0.2% np40,lo mm tris-hci [ph 7.51, 150 mm naci, and 2 mm edta) was added to each sample, and incubation continued for 60 min at 4°c. beads were sedimented by centrifugation and washed three times with 0.2% np40, 10 mm tris-hci (ph 7.5), 150 mm naci, and 2 mm edta, twice with 0.2% np40,lo mm tris-hci (ph 7.5), 500 mm naci, and 2 mm edta, and once with 10 mm tris-hci (ph 7.5). sample buffer for sds-page was added to the sedimented beads, and antigens were analyzed by sds-page and fluorography. in vitro transcription and translation plasmids were transcribed in vitro by e. coli rna polymerase, and the resulting mrna was translated in a wheat germ cell-free system as described by stueber et al. (1984) . to test for membrane translocation, rough microsomes from dog pancreas were included in the translation (blobel and dobberstein, 1975) . glycosylation onto asparagine residues was blocked by the addition of the acceptor peptide benzoylasn-leuthr-n-methylamide to a final concentration of 30 pm (lau et al., 1983; bause, 1983) . assays to test translocation of in vitro-synthesized proteins across, or their insertion into, the er membrane, accessibility to proteinase k was used. a 10 pl aliquot of a translation mixture containing rough microsomes was incubated for 10 min at 25oc with either 0.3 mg/ml of proteinase k or 0.3 mg/ml of proteinase k and 0.5% np40. further proteolysis was stopped by the addition of phenylmethylsulfonyl fluoride (pmsf) to 0.1 mglml, and the sample was further characterized by sds-page (laemmli, 1970) and fluorography or, where indicated in the figure, by immunoprecipitation. to remove secretory and peripheral membrane proteins, rough microsomes were subjected to a carbonate wash with 0.1 m na&os, ph 11 (fujiki et al., 1982) . peptide; h. gausepohl for performing automated amino acid analysis: m. t. haeuptle, i. ibrahimi, and d. meyer for critical reading of the manuscript, and annie steiner for expert typing. this work was supported by grant do 199/4-z from the deutsche forschungsgemeinschaft. the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 18 usc. section 1734 solely to indicate this fact. received april 23, 1986; revised july 1, 1986. multiple mechanisms of protein insertion into and across membranes a stop transfer sequence confers predictable transmembrane orientation to a previously secreted protein in cell-free systems clonal variation in cell surface display of an h-2 protein lacking a cytoplasmic tail we thank p a. peterson and l. claesson, uppsala, for plasmid py-2; e. bause, cologne, for the acceptor abrahamsen, l., moks, t., nilsson, b., hellman, u., and uhlen, m. (1985) . analysis of signals for secretion in the staphylococcal protein a gene. embo j. 4, 3901-3906. adams, g. a., and rose, j. k. (1985) . structural requirements for a membrane-spanning domain for protein anchoring and cell surface transport.cell 47, looi-1015.anderson, d. j., mostov, k. e., and blob& g. (1983) . mechanisms of integration of de novo-synthesized polypeptides into membranes: signal recognition particle is required for integration into microsomal membranes of calcium atpase and of lens mp26 but not of cytochrome be. proc. natl. acad. sci. usa 80, 7249-7253. bause, e. (1983) . structural requirements of n-glycosylation of proteins. biochem. j. 209, 331-336. blobel, g., and dobberstein, b. (1975) lingappa, v. r., katz, f. n., lodish, h. f., and blobel, g. (1978) . a signal sequence for the insertion of a transmembrane glycoprotein. similarities to the signals of secretory proteins in primary structure and function. j. biol. chem. 253, 8667-8670.lipp, j., and dobberstein, b. (1986). signal recognition particle-dependent membrane insertion of mouse invariant chain: a membrane spanning protein with a cytoplasmically exposed amino-terminus. j. cell biol. 702, 2169 -2175 . long, e. 0. (1985 . in search of a function for the invariant chain associated with la antigens. surv. immunol. res. 4, 27-34. lively, m. o., and walsh, k. a. (1983) spiess. m., and lodish. h. f. (1986) . an internal signal sequence: the asialoglycoprotein receptor membrane anchor. cell 44, 177-165. stanley, k. k., and luzio, j. p (1984) . construction of a new family of high efficiency bacterial expression vectors: identification of cdna clones coding for human liver proteins. embo j. 3, 1429 -1434 . strubin, m., mach, b., and long, e. 0. (1984 . the complete sequence of the mrna for the hla-dr associated invariant chain reveals a polypeptide with an unusual transmembrane polarity. embo j. 3,869~872. key: cord-269023-g21a9ik2 authors: mukherjee, siddhartha title: before virus, after virus: a reckoning date: 2020-10-15 journal: cell doi: 10.1016/j.cell.2020.09.042 sha: doc_id: 269023 cord_uid: g21a9ik2 the 2020 lasker awards, a celebration of one of the most prestigious international prizes given to individuals for extraordinary contributions to basic and clinical medical research, pubic health, and special achievement, was cancelled because of the covid-19 pandemic. typically, essays on the awardees and their scientific and medical contributions are solicited and published in cell in collaboration with the lasker committee. this year, the lasker committee commissioned an essay to reflect on the historic contributions that scientists and physicians have made to our understanding of immunology and virology, and future directions in medical and basic research that have been highlighted by covid-19 pandemic. the 2020 lasker awards, a celebration of one of the most prestigious international prizes given to individuals for extraordinary contributions to basic and clinical medical research, pubic health, and special achievement, was cancelled because of the covid-19 pandemic. typically, essays on the awardees and their scientific and medical contributions are solicited and published in cell in collaboration with the lasker committee. this year, the lasker committee commissioned an essay to reflect on the historic contributions that scientists and physicians have made to our understanding of immunology and virology, and future directions in medical and basic research that have been highlighted by covid-19 pandemic. ''if you think research is expensive, try disease.'' -mary lasker in the summer of 1882, a russian professor of zoology, elie metchnikoff (also called ilya mechnikov) quarreled with his colleagues at the university of odessa. he was a temperamental man with a depressive streak, with scientific interests that ranged from the embryology of cuttlefish to the digestive system of flatworms. but he was often in conflict with his colleagues, and in '82, he moved to sicily, where he set up a private laboratory (gordon, 2008) . in messina, where the warm, shallow, windy beaches yielded a constant wealth of marine animals, metchnikoff began to experiment with starfish. alone one evening-his wife and children had gone to watch the local circus-metchnikoff devised an experiment that would change our understanding of immunity. the starfish larvae were semi-transparent; he had been watching cells move about in the bodies. he was particularly interested in the movement of the cells after injury. what if he stuck a thorn in one of the starfish's feet? he spent a sleepless night and returned to the experiment the next morning. a group of motile cells-a ''thick cushion layer''had accumulated busily around the thorn. he had, in essence, observed the first steps in inflammation and immune response: the recruitment of immune cells to the site of injury. the immune cells moved toward the site of inflammation actively-i.e., on their own. '' [t] he accumulation of mobile cells round the foreign body is done without any help from the blood vessels or the nervous system,'' he wrote, ''for the simple reason that these animals do not have either the one or the other. it is thus thanks to a sort of spontaneous action that the cells group round the splinter'' (mechnikov, 1967) . by the mid-1880s, the splinter of the idea-immune cells being recruited actively to inflammatory sites to launch a response-led to a series of monumental experiments. the immune cells, he found, tried to ingest-eat-the infectious agent or irritant that had accumulated at the site. the phenomenon was called ''phagocytosis''-or eating (of an infectious agent) by an immune cell (metchnikoff, 1884) . in an extraordinary series of papers published in the mid-1880s-a body of work that would eventually win him the nobel prize (table 1 )-metchnikoff described the relationship between an organism and its invaders as ''kampf''-a ''drama unfolding within organisms'' that was like a perpetual struggle. he wrote, ''a battle takes place between the two elements [i.e., the microbe and the phagocytic cells]. sometimes the spores succeed in breeding. microbes are generated that secrete a substance capable of dissolving the mobile cells. such cases are rare on the whole. far more often it happens that the mobile cells kill and digest the infectious spores and thus ensure immunity for the organism'' (mechnikov, 1967) . as i write this, we are in mid-struggle against a miniscule, deadly pathogen that has swerved the course of human history. what words does one use-what phrases-to adequately capture the difference in living in the bv versus the av-before virus and after virus? to witness the sights and sounds of this struggle is to realize that life has been pushed off its known orbit forever: the constant beeping of alarms in the wards that eventually merged together into a mind-numbing wall of sound; the terror and confusion written across the brow of a (masked) cancer patient who was told that he had the virus; and, above all, the hideous damnation of dying alone, with a handheld camera as the only fragile connection with your family-''dying on iphone,'' as one doctor friend described it. this is not a moment to celebrate, but to reflect and recalibrate; it is a moment of introspection, perhaps even of revision. we need to look back to move forward. and so this essay looks back at history-of virology, vaccinations, and immunology-and asks: what have we learned, and what must be revisited? we knew about immunity long before we knew about the immune system. as early as 1500, medical healers in china had realized that those who survived smallpox did not catch the illness again (survivors of the disease were enlisted to take care of new victims) and inferred that the exposure of the body to an illness must protect it from future instances of that illness. chinese doctors ground smallpox scabs into a powder and insufflated it into a child's nose with a long pipe (jannetta, 2007) . vaccination with live virus was a tightrope walk: if the viral inoculum in the powder was too large, the child, instead of acquiring immunity, would acquire a full-fledged version of the disease-a devastation that occurred about one in a hundred times. if all went well, the child would have a mild, local experience of the disease, and be immunized for life. in the seventeen-sixties, traditional healers in sudan practiced tishteree el jidderee (''buying the pox''); a healer, typically a woman, haggled with a mother over the price of her sick child's ripest pustules (bayoumi, 1976) . it was an exquisitely measured art: the most astute among the healers recognized the lesions that were likely to yield just enough viral material, but not too much. the differing sizes and shapes of the pustules led to the european name for the disease: variola, from variation. the process of immunizing against the pox was called ''variolation. '' in may 1796, a young physician named edward jenner proposed a safer approach to smallpox vaccination. he used material from pustules of cowpox-a disease caused by a virus related to smallpox-harvested from a young dairymaid, sarah nelmes, and inoculated the son of his gardener, an 8-year-old boy named james phipps, with it. in july that year, he inoculated the boy again, but this time with material from a smallpox lesion. although jenner had breached virtually every boundary of ethical human experimentation (there is, for instance, no record of informed consent, and the subsequent ''challenge'' with live virus might well have been lethal to the child), it apparently worked: phipps did not develop smallpox. after facing initial resistance from the medical community, jenner increased his vaccination efforts and became broadly celebrated as the father of vaccination (even the word ''vaccine'' carries the memory of jenner's experiment; it is derived from ''vacca,'' latin for cow) (riedel, 2005) . yet even this story, retold and recycled in textbooks, is riddled with misattributions (history, too, has its revisions). the virus carried in sarah nelmes' pox lesions may have been horsepox, not cowpox (even jenner acknowledged the fact: ''the disease makes its progress from the horse [as i conceive] to the nipple of the cow, and from the cow to the human subject,'' he wrote). nor, perhaps, was jenner the first vaccinator: in 1774, benjamin jesty, a prosperous farmer from yetminster village in dorset, convinced by the stories of dairymaids who frequently got cowpox and seemed immune to smallpox, supposedly harvested lesions from the udder of an infected cow, and inoculated his wife and two sons. jesty became an object of ridicule among physicians and scientists-but his wife and children survived the smallpox epidemic without catching the disease (hammarsten et al., 1979) . but how did inoculation generate immunity, particularly longterm immunity? some factor produced in the body must be able to counter the infection and also retain a ''memory'' of the infection over multiple years. in 1888, the biochemist paul ehrlich (ehrlich, 1891) was traveling to egypt when he heard an extraordinary (and possibly apocryphal) story of a snake-charmer who, having been repeatedly bitten by a cobra during his childhood, had become resistant to subsequent attacks by cobra venom. ehrlich believed that an ''antivenin'' substance must have been generated in the snake-charmer's body. in 1890, in berlin, emil von behring and kitasato shibasaburo launched a series of experiments to understand how immunity to toxins and venoms might arise. among the most dramatic of these experiments was the demonstration that the serum of an animal exposed to tetanus, or to diphtheria toxin, could be transferred to another animal and confer immunity to tetanus or diphtheria (behring and kitasato, 1890) . in a rather desultory footnote to the diphtheria paper, von behring first used the word ''antitoxisch''-or anti-toxin-to describe the activity of the serum (lindenmann, 1984) . in 1891, in a wide-ranging, speculative paper entitled ''experimental studies on immunity'', ehrlich pushed scientists to imagine the material nature of this ''activity.'' he boldly coined the word ''anti-korper''-anti-body. the word ''korper''-from corpus, or body-signaled his growing conviction that an ''antibody'' was an actual chemical substance-a ''body'' generated to defend the body. where did these antibodies come from? in the 1940s, the danish physiologists mogens bjørneboe and harald gormsen and their swedish colleague, astrid fagraeus (fagraeus, 1947) , showed that the serial inoculation of rabbits with vaccines or toxins caused a particular cell type, called plasma cells, to expand and secrete antibodies. the origin of these plasma cells was traced back to a particular class of white blood cell called a b cell (bjørneboe and gormsen, 1942) . drawing on this early work, max cooper, a young biologist working with robert good in minnesota, followed the trail of a report first published in a poultry journal and demonstrated that in chickens, b cells were generated in an organ called the bursa of fabricius, found near their cloaca (the organ had been described by the medieval anatomist hieronymus fabricius). when cooper removed the bursa in irradiated hatchlings, there were no b cells, and no antibodies. in humans, though, there was no bursa (cooper et al., 1965) . instead, b cells were eventually found to originate in white-blood progenitors, typically found in the bone marrow. but the puzzle of how a plasma cell might learn to produce a specific antibody to bind an antigen-a biological molecule that was a yang to an antigen's yin-remained unsolved until the late 1950s. in the 1800s, ehrlich had proposed a magnificent theory. every cell in the body, he argued, displayed an immense set of unique proteins-''side chains,'' as he called themattached to its surface. the side chains were shaped in the form of cognate opposites, or inverted shapes, to the toxin or antigen-like a lock to a key, or a mold to a statue. when a toxin or pathogenic substance bound to one such side chain in a cell, the cell increased the production of that side chain. with repeated exposures to the antigen, ehrlich speculated, the side chain was ultimately released into the blood, thereby producing an antibody. but the theory required every immune cell to come pre-loaded with side chains carrying an inverted universe of all molecules-a mind-boggling cosmos of antibodies that had to be present in every immune cell. decades later, the chemist linus pauling proposed an even more rococo theory: the specificity of an antibody for its cognate antigen was created by an antibody folding around an antigen and acquiring the inverted shape of the antigen. the antigen, in short, was like a mold that ''instructed'' an antibody how to form around it. but the ''instruction'' and the ''infinite side chain'' theory were both conceptually implausible: proteins couldn't be made to fold around antigens, like medieval drapery, nor could a cell display an infinite variety of side chains, awaiting release. the most plausible solution to the conundrum of how antibodies were generated, and how they became antigen specific, was eventually proposed in an obscure paper published in 1957 in the australian journal of science by a melbourne scientist, frank macfarlane burnet (burnet, 1976) , who drew on earlier work by niels jerne and david talmage. what if, burnet reasoned, every b cell expressed only one antibody? in short, a massive ''repertoire'' of antibodies was already present in the immune cells of the body, and it was the antibody-expressing cell-not the antibody itself-that was selected, and grew, when it bound the antigen. '' [i]t is tempting to consider that one of the multiplying units in the antibody response is the cell itself,'' talmage had written. '' [but] only those cells are selected for multiplication whose synthesized product has affinity for the antigen injected.'' (the italics are mine.) burnet, following this line of thought, reasoned that it was this clonal proliferation of an immune cell-a cell stimulated by the binding of an antigen-that enabled the antibody response. at oxford, james gowans discovered that the ''burnetian repertoire'' (as it came to be called) was carried by circulating small lymphocytes that divided rapidly in response to antigens. when he transferred these active lymphocytes-later found to be b cells-from an antigen-exposed animal to a naive animal (an ingeniously simple experiment), gowans found that he could transfer antibody-mediated immunity as well. as the geneticist joshua lederberg wrote with remarkable prescience (yet without experimental evidence), ''do antigens bear instructions for antibody specificity [as pauling had argued] or do they select cell lines [that are specific for the antigen-i.e. by clonal selection]''? lederberg clearly favored the second theory (lederberg, 1959) . the molecular ''shape'' of an antibody was also soon solved: between 1959 and 1962, gerald edelman (edelman and poulik, 1961) and rodney porter (porter, 1959) , working at the rockefeller university in new york and oxford university (refs), respectively, discovered that most antibodies are y-shaped molecules (some subclasses of antibodies have modifications to this shape). the two outer tines of the y bind to the antigen, each acting like a prong. the shaft, or the stem, of the y, serves many functions. macrophages use shaft to capture antibodybound microbes, viruses, and peptide fragments and swallow them, much like the shaft of a fork is used to pull food into the mouth. this, indeed, is one mechanism of ''phagocytosis''cells eating microbes-the phenomenon that metchnikoff had observed. the shaft or stem of the y has yet other purposes: it also attracts a cascade of toxic immune factors to attack microbial cells. the genetics of how immune cells make such a diverse repertoire of antibodies-a unique antibody type per cell-was worked out, piece by piece, by susumu tonegawa, leroy hood, and phil leder. it involved the regulated shuffling of dna within the b cell-the recombination of genetic modules, followed by more mutations to create a ''mature'' antibody-a strategy that lederberg had loosely, and presciently, proposed years earlier. in 1961, a thirty-year-old phd student in london, jacques miller, discovered the function of a human organ that most scientists had long forgotten. the thymus-named because it vaguely resembles the lobe-shaped leaves of the thyme plant-was, as galen described it, ''a bulky and soft gland'' that sat above the heart. even galen noted that it slowly involuted as humans grew older. and when the organ was removed from adult animals, nothing significant happened. a dwindling, dispensable, involuting organ; how could it possibly be essential for human ll lives? scientists began to think of the thymus as a vestigial detritus left behind by evolution-an appendix or a tailbone hanging, incidentally, above the heart. but might it have a function during fetal development? using minute forceps and the thinnest silk sutures, miller removed the thymus from neonatal mice about sixteen h after birth. the effect was unexpected and dramatic. the lymphocytes in the blood-the white cells in the blood that were not macrophages or monocytes-dropped dramatically, and the animals became increasingly susceptible to common infections. b cells dropped in number, but some other white cell-some previously unknown type-was even more dramatically diminished. many of the mice died of the mouse hepatitis virus; many had bacterial pathogens colonize their spleens. by the mid-1960s, miller had realized that the thymus was the site of maturation for a different kind of immune cell-not a b cell, but a t cell, from the word ''t-hymus'' (max cooper, working independently, had also established that two kinds of lymphocytes existed, and that the thymus was the maturation site for t cells). but if b cells generate antibodies to kill microbes, what do t cells do (miller, 2020) ? in the 1970s, rolf zinkernagel and peter doherty, immunologists working in australia, provided the first clue. they began with so-called killer t cells: these t cells would recognize the virus-infected cells, perforate their cell membranes, and douse them with toxins, forcing the infected cells to shrivel and die, thereby purging the virus within the cell as a result. these t cells would be eventually known as cytotoxic (i.e., ''cell killing'') t cells, and they carried a marker on their surface: cd8 (zinkernagel and doherty, 1974) . but the peculiar thing about these cd8-positive t cells, zinkernagel and doherty discovered, was that they had a capacity to recognize viral infections only in the context of the ''self''-i.e., only if the t cell and the infected cells came from the same strain of mouse. it was as if the t cell was capable of computing a kind of dual logic. first: does the cell that i am surveying belong to my body? and second: is it infected with a virus or a bacterium? using genetic techniques, zinkernagel and doherty tracked the detection of the ''self'' to a molecule called major histocompatibility complex (mhc) class i-a protein that comes in thousands of variants. each of us carries a unique combination of mhc class i genes. it is this ''self'' mhc that the t cell first detects. it is as if the mhc protein is a frame. without the right frame, or context, the t cell cannot even see the picture. the zinkernagel-doherty experiments had solved one half of the logic problem. but how does a cd8 cell find a self-cell with a virus embedded within it? my doctoral mentor, alain townsend, first at mill hill in london, and then at oxford, took up this question in the 1990s. townsend began his experiments with cd8 killer t cells and influenza virus. some of these killer t cells elicited by flu infection, researchers had found, were detecting the presence of the influenza protein, called np, inside a flu-infected cell (townsend et al., 1986) . but that's where the mystery began. ''that protein, np, never makes it to the cell surface intact,'' townsend told me recently. we were sitting in a london taxi cab, returning from a lecture. it was london dusk, with its mix of smog and rain and sudden shards of oblique english light, and the streets, as we sped through them-old bond, bury street-were full of houses with partially lit windows and closed doors. how could you detect a resident inside one of these houses, unless the resident happened to poke his head outside? ''np is always inside the cell,'' alain continued. he performed the most sensitive tests-assay upon assay, week upon weekto find the np protein on the flu-infected cell's surface, where a t cell might detect it. but it wasn't there. ''as far as cell surface proteins are concerned, there is nothing for a np-detecting t cell to see. it's invisible on the cell surface-it isn't even there-and yet it's perfectly visible to the t cell'' (a. townsend, personal communication) . how, then, was the t cell detecting np? the crucial discoveries came in late 1980s. the cd 8 killer t cells, alain found, was not recognizing intact np, poking its face outside the cell. rather, the cells were detecting viral peptides-small pieces, or fragments, of the viral protein, np. and crucially, these peptides had to be ''presented'' to the t cells in the right ''frame''-in this case, carried, or loaded, by the class i mhc protein-the very protein that zinkernagel and doherty had implicated in the killer t cell response. the class i protein was actually a carrier, a peptide-bearer-and thus the ''frame'' required for the recognition by a cd8 t cell. in the 1990s, working in parallel, emil unanue began to explore the immune detection of microbes that are internalized by cellsa la metchnikoff. once phagocytosed, the microbes and their debris are targeted to compartments, such as the lysosome, chock-full of degrading enzymes, that can chop the proteins into peptides. and analogous to what townsend had found, these peptide fragments from the microbes are bound by a related class of protein carriers-called class ii mhcs-that present the peptides, as if on a special molecular platter, to the t cell (harding and unanue, 1990) . but it's here that the immune response diversifies and forks; it assumes a second wing of attack. a second subclass of t cells, called cd4 positive cells, senses these mhc-ii carrier-mounted peptide fragments. instead of killing the infected cell, the cd4 t cell incites b cells to start synthesizing antibodies. it secretes chemical substances, including cytokines, that amplify the macrophage's capacity to become mobile and phagocytose; it causes an upsurge of local blood flow and summons yet other immune cells to challenge the infection. in the absence of the cd4 cell, the transition between the detection of a pathogen and antibody production by b cells falls apart. for all these properties-and especially for supporting the b cell antibody response-this type of cell is called the ''helper'' t cell. there's a final type of immune cell that deserves mention. in 1973, ralph steinman, working at the rockefeller university in new york, looked down a microscope and found cells in lymph nodes that ''assume a variety of branching forms, and constantly extend and retract many fine cell processes''-like a mobile, many-branched tree. ''dendritic cells,'' as steinman named them (after the greek work for ''tree'') are professionally designed to present antigens to t cells and jumpstart an immune response (steinman and cohn, 1973) . in a sense, the discovery of dendritic cells brings us back, full circle, to the kampf between pathogens and the immune system and to the origins of immunology. the history of immunology forms a strange circle: it returns to rediscover its origins. the ll cell 183, october 15, 2020 311 century that followed metchnikoff's discovery of macrophages-from the 1880s to 1980s-was dominated by antibodies, b cells, and t cells. these responders to infection are ''adaptive'', i.e., they arise, on command, to attack specific pathogens. but in evolutionary terms, this adaptive immunity is a relative newcomer. amidst the buzz and excitement of b and t cells, a more ancient wing of the immune system-the so-called ''innate'' system-was largely forgotten and ignored. dendritic cells and macrophages, among several other cell-types, are part of this innate immune system. these cells possess receptors, including a family called tolllike-receptors, or tlrs, that do not recognize specific pathogens but molecular ''patterns'' common to pathogens in general. these patterns are chemicals carried or released by viruses and bacteria when they enter the body or infect a cell, including components of the bacterial cell wall or forms of viral rna (these pathogen-induced, pattern-recognition receptors and the signals activated by them were described and discovered by many scientists. among them, bruce beutler, jules hoffman, charles janeway, and ruslan medzhitov deserve special mention). prompted by signals from these pattern recognition receptors, the cells of innate immune system release specific signals and chemicals-interferons, among them-to stir up an anti-viral and inflammatory response. they are the first responders to infections-and yet, ironically, among the last to be fully acknowledged, or understood, as essential parts of the organismal physiology of the immune response. i am an immunologist-turned-virologist-turned-internist-turnedoncologist-turned-writer-turned-historian (which is to say: i have mastered the science of lack of expertise). but i am also a new york doctor who experienced the devastating brunt of the sars-cov-2 epidemic through the stories of my patients, nurses, and colleagues. i present this history-cursory, abbreviated, and familiar, perhaps, to many readers-with due humility to capture two contrasting points. first: to illuminate how richly the past century of immunological research has contributed to our understanding of the typical response to viruses and some pathogens. but second, and conversely: to highlight how poorly we understand the physiological consequences of the immune response to sars-cov-2. the power of science lies in its ability to dissect physiological phenomena into their component pieces. but the sars-cov-2 pandemic has illustrated that reassembling those pieces to understand immune physiology at an organismal level remains elusive, particularly for this virus. take, for instance, just three of the many mysteries of sars-cov-2 infection that we are still trying to solve. first: what determines the strength and durability of an immune response to the virus? it's a question of seminal importance to vaccine developers, and yet, definitive answers are missing. in a paper published in nature, michel nussenzweig and his colleagues dissected the immune response to sars-cov-2 infection (robbiani et al., 2020) . nearly one-third of infected patients, they found, produced very low amounts (or ''titers'') of neutralizing antibodies to the virus. i asked nussenzweig, one of the most knowledgeable immunologists in the field, about the relevance of these sluggish antibody responses. do individuals with low-titer antibodies have fewer memory b cells to combat a future infection? can they be re-infected-and if so, would they suffer milder disease? and could such re-infected individuals carry enough virus to infect the immunologically naive population? or take an even more basic question that has enormous epidemiological and public policy significance: is there a level, or threshold, of viral load that a patient must carry in order to infect others? in other words, is there a difference between the infected and the infectious (if so, more stringent isolation protocols might be deployed on those that are infectious until they clear the virus)? nussenzweig doesn't know-and nor, of course, does the whole field. and what about t cells? some of the vaccines currently in latephase trials elicit t cell responses, while the nature and strength of the t cell response for some vaccine candidates remains unknown. does it matter? does it influence the efficacy or durability of the vaccine? we don't know. and there's an odd finding that keeps cropping up: some people-up to forty percent in some studies-possess t cells that ''cross-recognize'' sars-cov-2infected cells because these people have been previously infected by other, related common-cold coronaviruses that share genetic similarities. could these people be partially protected? we don't know. more generally, why do infections by some viruses, or inoculation with some vaccines, precipitate durable, long-term responses, while the immunity to others wanes over time, causing re-infections, and requiring ''boosters'' for continuous immunity? we don't know. despite decades of research on the immune response to viruses, fundamental questions about vaccine development, immune durability, and the physiology of the anti-viral response in the human organism, remain unsolved. second: why do some people recover from infection, while others progress to a fulminant, deadly disease? are there host factors that predict severe disease? an intriguing dutch study implicated one gene: tlr7. this x-linked gene was mutated in two pairs of brothers who suffered an atypically severe form of covid-19 for their age (one pair was found to have a deletion of the gene, while the other pair had a single amino acid change) (van der made et al., 2020) . tlr7 is one of the receptors involved in the innate immune response to viruses. when cells from the peripheral blood of these brothers were challenged with chemical signals that activate tlr7, the production of interferons (particularly a subtype termed type i), and interferon-related genes, was blunted, especially in the pair of brothers with the deletion in tlr7. a separate study from a team in paris converged on similar results (hadjadj et al., 2020) . the team profiled fifty virus-infected patients and eighteen controls. and again, in patients with most severe forms of the disease, the expression of type i interferon was blunted, while the blood levels of other inflammatory cytokines, such as interleukin 6 and tumor necrosis factor a, were increased. akiko iwasaki's group at yale also profiled a large cohort of patients with moderate or severe infection and compared them to healthy controls (lucas et al., 2020) . the sustained activation of certain patterns of chemokines and cytokines was correlated with severe illness-a phenomenon that iwasaki has termed ''immunological misfiring.'' a more recent paper, published in cell, also implicated dysfunctions in innate immune cells, particularly myeloid cells such as neutrophils and monocytes, in patients with severe covid infection (schulte-schrepping et al., 2020) . to read these papers is to glimpse a code, or a pattern, behind them-but to be unable to find the code-breaking algorithm. the rosetta stone is missing. one possibility is that type 1 interferons produced by lung cells (possibly by lung-resident immune cells, including dendritic cells) are necessary for initial resistance. a blunted response fails to control the virus and predicts worse disease. once the infection progresses, though, innate cells such as monocytes produce the dysfunctional cytokine storm-the immunological misfiring that iwasaki describes. i asked iwasaki and medzhitov to reconcile these various studies. ''there appears to be a fork in the road to immunity to covid-19 that determines disease outcome,'' iwasaki told me. ''if you mount a robust innate immune response during the early phase of infection, you control the virus and have a mild disease. if you don't, you have uncontrolled virus replication in the lung that result[s] in misfiring of the immune response that fuels the fire of inflammation leading to severe disease'' (a. iwasaki, personal communication). but overall, the data suggest that innate cells, interferons, and a dysregulation of the intricate networks of signals that connect immune cells are somehow involved. again, though, these studies illustrate the fact that our understanding of the organismal physiology of this viral infection lacks the detail and resolution that are required to understand sars-cov-2 infection at a granular, mechanistic level. finally: what about the diffuse, systemic manifestations of sars-cov-2 infection? there are systemic physiological effects of cov-2 infection that remain mysterious. some infected children experience an autoimmune illness similar to kawasaki's disease (jones et al., 2020) . why? we don't know. microstructural changes have been found in the brains of some affected patients (filatov et al., 2020) ; there are cardiac, vascular, and autoimmune sequelae of the infection that we don't understand. many infected adults have blood clotting disorders that require the use of anti-clotting medicines (al-samkari et al., 2020) . the pandemic has energized us, yes, but it has also provided a necessary dose of humility. it has also been a call to action. it is time, as mary lasker would have it, to return to research, to reflection, to revision (''[i]f you think research is expensive, try disease''). we have learned so much. we have so much left to learn. this article was commissioned and paid for by the lasker committee. the paragraph on the practice of the chinese and sudanese practice of variolation is adapted from the new yorker (mukherjee, 2020) and will appear in a forthcoming book by s.m. s.m. is a co-founder of vor, myeloid, immuneel, faeth, and cura therapeutics and serves on the boards of frequency, trialspark, equillium, cellenkos, and puretech. covid-19 and coagulation: bleeding and thrombotic manifestations of sars-cov-2 infection the history and traditional treatment of smallpox in the sudan ueber das zustandekommen der diphtherie-immunitä t und der tetanus-immunitä t bei thieren experimental studies on the role of plasma cells as antibody producers a modification of jerne's theory of antibody production using the concept of clonal selection delineation of the thymic and bursal lymphoid systems in the chicken studies on structural units of the g-globulins experimentelle untersuchungen ü ber immunitä t plasma cellular reaction and its relation to the formation of antibodies in vitro neurological complications of coronavirus disease (covid-19): encephalopathy. cureus elie metchnikoff: father of natural immunity impaired type i interferon activity and inflammatory responses in severe covid-19 patients who discovered smallpox vaccination? edward jenner or benjamin jesty? quantitation of antigen-presenting cell mhc class ii/peptide complexes necessary for t-cell stimulation the vaccinators: smallpox, medical knowledge, and the opening of japan covid-19 and kawasaki disease: novel virus and novel case do antigens bear instructions for antibody specificity, or do they select cell lines that arise by mutation? origin of the terms 'antibody' and 'antigen' longitudinal analyses reveal immunological misfiring in severe covid-19 on the present state of the question of immunity in infectious diseases ueber eine sprosspilzkrankheit der daphnien. beitrag zur lehre ü ber den kampf der phagozyten gegen krankheitserreger. archiv fü r pathologische anatomie und the early work on the discovery of the function of the thymus, an interview with jacques miller how does the coronavirus behave inside a patient? the new yorker the hydrolysis of rabbit y-globulin and antibodies with crystalline papain edward jenner and the history of smallpox and vaccination convergent antibody responses to sars-cov-2 in convalescent individuals deutsche covid-19 omics initiative (decoi) (2020). severe covid-19 is marked by a dysregulated myeloid cell compartment identification of a novel cell type in peripheral lymphoid organs of mice. i. morphology, quantitation, tissue distribution the epitopes of influenza nucleoprotein recognized by cytotoxic t lymphocytes can be defined with short synthetic peptides presence of genetic variants among young men with severe covid-19 restriction of in vitro t cell-mediated cytotoxicity in lymphocytic choriomeningitis within a syngeneic or semiallogeneic system key: cord-326916-bakwk4tm authors: fauver, joseph r.; petrone, mary e.; hodcroft, emma b.; shioda, kayoko; ehrlich, hanna y.; watts, alexander g.; vogels, chantal b.f.; brito, anderson f.; alpert, tara; muyombwe, anthony; razeq, jafar; downing, randy; cheemarla, nagarjuna r.; wyllie, anne l.; kalinich, chaney c.; ott, isabel m.; quick, joshua; loman, nicholas j.; neugebauer, karla m.; greninger, alexander l.; jerome, keith r.; roychoudhury, pavitra; xie, hong; shrestha, lasata; huang, meei-li; pitzer, virginia e.; iwasaki, akiko; omer, saad b.; khan, kamran; bogoch, isaac i.; martinello, richard a.; foxman, ellen f.; landry, marie l.; neher, richard a.; ko, albert i.; grubaugh, nathan d. title: coast-to-coast spread of sars-cov-2 during the early epidemic in the united states date: 2020-05-07 journal: cell doi: 10.1016/j.cell.2020.04.021 sha: doc_id: 326916 cord_uid: bakwk4tm the novel coronavirus sars-cov-2 was first detected in the pacific northwest region of the united states in january 2020, with subsequent covid-19 outbreaks detected in all 50 states by early march. to uncover the sources of sars-cov-2 introductions and patterns of spread within the united states, we sequenced nine viral genomes from early reported covid-19 patients in connecticut. our phylogenetic analysis places the majority of these genomes with viruses sequenced from washington state. by coupling our genomic data with domestic and international travel patterns, we show that early sars-cov-2 transmission in connecticut was likely driven by domestic introductions. moreover, the risk of domestic importation to connecticut exceeded that of international importation by mid-march regardless of our estimated effects of federal travel restrictions. this study provides evidence of widespread sustained transmission of sars-cov-2 within the united states and highlights the critical need for local surveillance. a novel coronavirus, known as sars-cov-2, was identified as the cause of an outbreak of pneumonia in wuhan, china, in december 2019 (gorbalenya et al., 2020; wu et al., 2020; zhou et al., 2020) . travel-associated cases of coronavirus disease 2019 were reported outside of china as early as january 13, 2020, and the virus has subsequently spread to nearly all nations (world health organization, 2020a ). the first detection of sars-cov-2 in the united states was a travel-associated case from washington state on january 19, 2020 (centers for disease control and prevention, 2020a) . the majority of early covid-19 cases in the united states were (1) associated with travel to a ''high-risk'' country or (2) close contacts of previously identified cases according to the testing criteria adopted by the centers for disease control and prevention (cdc) (centers for disease control and prevention, 2020b) . in response to the risk of more travel-associated cases, the united states placed travel restrictions on multiple regions with sars-cov-2 transmission, including china on january 31, iran on february 29, and europe on march 11 (taylor, 2020) . however, community transmission of sars-cov-2 was detected in the united states in late february, when a california resident contracted the virus despite meeting neither testing criterium (moon et al., 2020) . from march 1-19, 2020, the number of reported covid-19 cases in the united states rapidly increased from 74 to 13,677, and the virus was detected in all 50 states (dong et al., 2020) . it was recently estimated that the true number of covid-19 cases in the united states is likely in the tens of thousands (perkins et al., 2020) , suggesting substantial undetected infections and spread within the country. we hypothesized that, with the growing number of covid-19 cases in the united states and the large volume of domestic travel, new united states outbreaks are now more likely to result from interstate rather than international spread. because of its proximity to several high-volume airports, southern connecticut is a suitable location in which to test this hypothesis. by sequencing sars-cov-2 from local cases and comparing their relatedness to virus genome sequences from other locations, we used ''genomic epidemiology'' (grubaugh et al., 2019a) to identify the likely sources of sars-cov-2 in connecticut. we supplemented our viral genomic analysis with airline travel data from major airports in southern new england to estimate the risk of domestic and international importation therein. our data suggest that the risk of domestic importation of sars-cov-2 into this region now far outweighs that of international introductions regardless of federal travel restrictions and provide evidence for coast-to-coast sars-cov-2 spread in the united states. to delineate the roles of domestic and international virus spread in the emergence of new united states covid-19 outbreaks, we sequenced sars-cov-2 viruses collected from cases identified in connecticut. our phylogenetic analyses showed that the outbreak in connecticut was caused by multiple virus introductions and that most of these viruses were related to those sequenced from other states rather than international locations ( figure 1 ). we sequenced sars-cov-2 genomes from nine of the first covid-19 cases reported in connecticut, with sample collection dating from march 6-14, 2020 (data s1). these individuals are residents of eight different cities in connecticut. according to the connecticut state department of public health, none of the cases were associated with international travel. using our amplicon sequencing approach, ''primalseq'' (grubaugh et al., 2019b; quick et al., 2017) , with the portable oxford nanopore technologies (ont) minion platform, we generated the first sars-cov-2 genome approximately 14 h after receiving the sample (ct-yale-006), demonstrating our ability to perform near-real-time clinical sequencing and bioinformatics. our complete workflow included rna extraction, pcr testing, validation of pcr results, library preparation, sequencing, and live base calling and read mapping. we shared the genomes of these viruses publicly as we generated them (gisaid epi_isl_416416-416424). we combined our genomes with other publicly available sequences for a final dataset of 168 sars-cov-2 genomes ( figure 1 ; data s2). the dataset can be visualized on our ''community'' nextstrain page (https://nextstrain.org/community/ grubaughlab/ct-sars-cov-2/paper1). we built phylogenetic trees using a maximum likelihood reconstruction approach, and we used shared nucleotide substitutions to assess clade support ( figure 1 ; data s3). our first nine sars-cov-2 genomes clustered into three distinct phylogenetic clades, indicating multiple independent virus introductions into connecticut. our sars-cov-2 genome ct-yale-001 clusters closely with other viruses sequenced from asia (china), whereas the close genetic relatedness of genomes from europe and washington state in the clade that contains ct-yale-006 makes it difficult to track the origins of this virus ( figure 1a ). regardless, neither the ct-yale-001 nor the ct-yale-006 covid-19 cases were travel-associated, which indicates that these patients were part of domestic transmission chains that stemmed from undetected introductions. the other seven sars-cov-2 genomes clustered with a large, primarily united states clade, within which the majority of genomes were sequenced from cases in washington state ( figure 1b ). because of a paucity of sars-cov-2 genomes from other regions within the united states, we could not determine the exact domestic origin of these viruses in connecticut. we also cannot yet determine whether the higher number of substitutions observed in ct-yale-007 and ct-yale-008 ( figure 1b ) compared with the other connecticut virus genomes within this clade was the result of multiple introductions or of significant undersampling. however, given that seven of our nine connecticut sars-cov-2 genomes fell within this clade versus the many other international clades, these were most likely the result of a common domestic source(s) rather than repeated international introductions. importantly, our data indicate that, by early to mid-march, there had already been interstate spread during the early covid-19 epidemic in the united states. our phylogenetic analysis shows that the covid-19 outbreak in connecticut was driven, in part, by domestic virus introductions. to compare the roles of interstate and international sars-cov-2 spread in the united states, we used airline travel data and the epidemiological dynamics in regions where travel routes originated to evaluate importation risk. we found that, because of the large volume of daily domestic air passengers, the dominant ll a b figure 1 . the covid-19 outbreak in connecticut is phylogenetically linked to sars-cov-2 from washington (a) we constructed a maximum-likelihood tree using 168 global sars-cov-2 protein coding sequences, including 9 sequences from covid-19 patients identified in connecticut from march 6-14, 2020. the total number of nucleotide differences from the root of the tree quantifies evolution since the putative sars-cov-2 ancestor. we included clade-defining nucleotide substitutions to directly show the evidence supporting phylogenetic clustering. the number of sars-cov-2 genomes used in this phylogenetic tree from each location is shown in parentheses. (b) we enlarged the united states clade consisting primarily of sars-cov-2 sequences from washington state and connecticut. the map shows the location and number of sars-cov-2 genomes that cluster within this clade. the minion sequencing statistics are enumerated in data s1, and the sars-cov-2 sequences used and author acknowledgments can be found in data s2. a root-to-tip plot showing the genetic diversity and substitution rate of the data can be found in figure s1 . the genomic data can be visualized and interacted with at https://nextstrain.org/community/grubaughlab/ct-sars-cov-2/paper1. importation risk into the connecticut region switched from international to domestic by early to mid-march ( figure 2 ). we first estimated daily passenger volumes arriving in the region from the five countries (china, italy, iran, spain, and germany) and out-of-region states (washington, california, florida, illinois, and louisiana) that have reported the most covid-19 cases to date (figures 2a-2d ). by march 18, the five countries comprised 78% of reported non-united states cases, whereas the five states comprised 48% of reported domestic cases outside of connecticut and new york. to this end, we collected passenger volumes arriving in three major airports in southern new england: bradley international airport (bdl; hartford, connecticut), general edward lawrence logan international airport (bos; boston, massachusetts), and john f. kennedy (b) we selected three international airports in the region that are commonly used by connecticut residents: hartford (bdl), boston (bos), and new york (jfk). we used data from january to march 2019 to estimate relative differences in daily air passenger volumes from the selected origins to the airport destinations. these daily estimates were then combined by either international or domestic travel. (c and d) the cumulative number of daily covid-19 cases were divided by 100,000 population to calculate normalized disease prevalence for each international location (china, italy, iran, spain, and germany) (c). the cumulative number of daily covid-19 cases were divided by 100,000 population to calculate normalized disease prevalence for each international location (washington, california, florida, illinois, and louisiana) (d). (e) we calculated importation risk by modeling the number of daily prevalent covid-19 cases in each potential importation source and then estimating the number of infected travelers using the daily air travel volume from each location. the data, criteria, and analyses used to create this figure can be found in data s3. ll international airport (jfk; new york, new york; figure 2b ). because travel data for 2020 are not yet available, we calculated the total passenger volume from each origin and destination pair between january and march 2019, and estimated the number of daily passengers. we found that the daily domestic passenger volumes were $100 times greater than international in hartford, $10 times greater in boston, and $4 times greater in new york in our dataset ( figure 2b ). by combining daily passenger volumes ( figure 2b ) with covid-19 prevalence at the travel route origin (figures 2c and 2d) and accounting for differences in reporting rates, we found that the domestic and international sars-cov-2 importation risk started to increase dramatically at the beginning of march 2020 ( figure 2e ). without accounting for the effects of international travel restrictions, our estimated domestic importation risk from the selected five states surpassed the international importation risk by march 10. using previous assumptions around travel restrictions (chinazzi et al., 2020) , we also modeled two scenarios where federal travel restrictions reduced passenger volume by 40% and by 90% from the restricted countries ( figure 2e ). because of the overall low prevalence of covid-19 in china, we did not find any significant effects of travel restrictions from china that were enacted on february 1 (data s3). also, we did not find significant changes to the importation risk following travel restrictions from iran on march 1, likely because of the relatively small number of passengers arriving from that country (data s3). although we did find a dramatic decrease in international importation risk following the restrictions on travel from europe (march 13), this decrease occurred after our estimates of domestic travel importation risk had already surpassed that of international importation ( figure 2e ). the dramatic rises in domestic and international importation risk preceded the state-wide covid-19 outbreak in connecticut ( figure 2e ), and the recent increase in risk of domestic importation may give rise to new outbreaks in the region. the combined results of our genomic epidemiology and travel pattern analyses suggest that domestic spread recently became a significant source of new sars-cov-2 infections in the united states. we find strong evidence that outbreaks on the east coast (connecticut) are linked to outbreaks on the west coast (washington), demonstrating that trans-continental spread has already occurred. as of march 25, there are more than 1,000 sars-cov-2 genomes sequenced from around the world, including more than 350 from the united states (https:// nextstrain.org/ncov); however, most of the latter were obtained from a small number of states. therefore, we cannot determine the exact origins of the viral introductions into connecticut. recent domestic travel history of the nine reported cases was not available, but it is unlikely that all of the infections originated in washington state. furthermore, because of low genetic diversity between these early sequences from connecticut and washington, we cannot yet quantify the rate at which the virus may be spreading between the united states coasts or whether an introduction from a common source is responsible for phylo-genetic grouping. there are likely other large, multi-state phylogenetic sars-cov-2 clades that exist in the united states. as testing capacity increases and more viral genome sequences become available from new locations, more granular reconstructions of virus spread throughout the united states will be possible (grubaugh et al., 2019a) . specifically, elucidating the phylogenetic relationship of viral genomes collected in connecticut to those collected in neighboring states, especially states with a high burden of disease, like new york, will improve our understanding of critical interstate dynamics. our estimates of domestic importation risk are likely conservative despite some important limitations of our air travel analysis. because we do not have access to current airline data, we could not exactly quantify the effect of government restrictions on international travel. in addition, even without explicit government restrictions, general social distancing and work-fromhome guidelines are reducing all airline travel. by using airline data available from 2019, we did not account for these decreases in our international or domestic travel patterns. although such variations may lower our domestic risk estimates, we also did not account for the large volumes of regional automobile and rail travel, especially along the corridor that connects massachusetts, new york, new jersey, pennsylvania, and washington d.c. to connecticut. we do not believe that connecticut is more closely connected to its neighbors than states in other regions of the country. therefore, our risk estimates indicate that this interconnectedness will perpetuate the domestic spread of sars-cov-2 and that domestic spread will likely become the primary source of new infections in the united states. we argue that, although simplistic, our model demonstrates the urgent need to focus control efforts in the united states on preventing further domestic virus spread. as this epidemic progresses, domestic introductions of the virus could undermine control efforts in areas that have successfully mitigated local transmission. in china, local outbreak dynamics were highly correlated with travel between wuhan and the outbreak dynamics therein during the early months of the epidemic . similarly, if interstate introductions are not curtailed in the united states with improved surveillance measures, more robust diagnostic capabilities, and proper clinical care, quelling local transmission within states will be a sisyphean task. we therefore propose that a unified effort to detect and prevent new covid-19 cases will be essential for mitigating the risk of future domestic outbreaks. this effort must ensure that states have sufficient personal protective equipment, sample collection materials, and testing reagents because these supplies enable effective surveillance. finally, state-and local-level policymakers must recognize that the health and well-being of their constituents are contingent on that of the nation. if spread between states is now occurring, as our results indicate, then the united states will struggle to control covid-19 in the absence of a unified surveillance strategy. detailed methods are provided in the online version of this paper and include the following: the authors of this study would like to acknowledge s. cordey, i. eckerle, and l. kaiser from geneva university hospital for directly sharing their genome sequence data with our team; everyone who openly shared their genomic data on genbank and gisaid (authors listed in data s2); d. a.l.w. is the principal investigator on a research grant from pfizer to yale university and has received consulting fees for participation in advisory boards for pfizer. received: march 26, 2020 revised: april 5, 2020 accepted: april 14, 2020 published: may 5, 2020 this study did not generate new unique reagents, but raw data and code generated as part of this research can be found in the supplemental files, as well as on public resources as specified in the data and code availability section below. the accession number for the sars-cov-2 sequence data reported in this paper is ncbi bioproject:prjna614976 and gisaid: epi_isl_416416-416424. sequencing data have been made available via sra. data used to create the figures can be found in the supplemental files. the interactive nextstain page to visualize the genomic data can be found at: https://nextstrain.org/ community/grubaughlab/ct-sars-cov-2/paper1. the raw data, results, and analyses can be found at: https://github.com/ grubaughlab/ct-sars-cov-2. residual de-identified nasopharyngeal samples testing positive for sars-cov-2 by reverse-transcriptase quantitative (rt-q)pcr were obtained from the yale-new haven hospital clinical virology laboratory or the connecticut state department of public health. in accordance with the guidelines of the yale human investigations committee and the connecticut state department of public health, this work with de-identified samples is considered non-human subjects research. all samples were de-identified before receipt by the study investigators. sample collection and processing samples for this study were collected during an early testing phase by the connecticut state department of public health or the yale clinical virology laboratory at the yale school of medicine. none of the cases that we sequenced in this study were associated with international travel. all samples included in this study had ct values less than 35, sufficient volume of rna for library preparation, and were collected by march 14. as early samples were crucial for validating pcr diagnostics in multiple laboratories, the number of samples meeting these criteria were limited. nasopharyngeal swabs were collected from patients presenting with symptoms of sars-cov-2 infection at multiple medical centers in connecticut. these patients are all connecticut residents, but we do not have access to location data associated with each of these early sars-cov-2 genomes to avoid patient identification. swabs were placed in virus transport media (bd universal viral transport medium) immediately upon collection. samples (200 ml) were subjected to total nucleic acid extraction using the nuclisens easymag platform (biomã© rieux, france) at the yale clinical virology laboratory. the recommended cdc rt-qpcr assay was used to test for the presence of sars-cov-2 rna (centers for disease control and prevention, 2020c). a total of 10 samples from 10 different individuals met our inclusion criteria and were selected to to move forward with next generation sequencing (ngs). of these, we were successfully able to generate sequencing libraries from nine samples. sars-cov-2 positive samples were processed for ngs using a highly multiplexed pcr amplicon approach for sequencing on the oxford nanopore technologies (ont; oxford, united kingdom) minion using the v1 primer pools (quick et al., 2017) . sequencing libraries were barcoded and multiplexed using the ligation sequencing kit and native barcoding expansion pack (ont) following the artic network's library preparation protocol (v1 primers) (quick, 2020) with the following minor modifications: cdna was generated with superscriptiv vilo master mix (thermofisher scientific, waltham, ma, usa), a total of 20 ng of each sample was used as input into end repair, end repair incubation time was increased to 25 min followed by a 1:1 bead-based clean up, and blunt/ta ligase (new england biolabs, ipswich, ma, usa) was used to ligate barcodes to each sample. cdna synthesis and amplicon generation was performed concurrently for each sample. samples were processed by ct value to reduce the likelihood of contamination from high titer samples to low titer samples. barcoding, adaptor ligation, and sequencing was performed on samples with ct values between 25-35 (low titer group) prior to samples with ct values below 25 (high titer group) (data s1). two samples, yale-006 and yale-007, were diluted 1:100 in nuclease-free water prior to cdna synthesis. a no template control was created at the cdna synthesis step and amplicon generation step to detect cross-contamination between samples. controls were barcoded and sequenced with both the high and low titer sample groups. a total of 24 ng of the low titer group was loaded onto a minion r9.4.1 flow cell and sequenced for a total of 5.5 h and generated 2.1 million reads. the flow cell was nuclease treated, flushed, and primed prior to loading 25 ng of the high titer group library. these samples were sequenced for a total of 9 h and generated 1.4 million reads (data s1). the rampart software from the artic network was used to monitor the sequencing run to estimate the depth of coverage across the genome for each barcoded sample in both runs https://github.com/artic-network/rampart). following completion of the sequencing runs, .fast5 files were basecalled with guppy (v3.5.1, ont) using the high accuracy module. basecalling was performed on a single gpu node on the yale hpc. consensus genomes were generated for input into phylogenetic analysis according to the artic network bioinformatic pipeline (artic network). variants in the consensus genomes were called using nanopolish per the bioinformatic pipeline (loman et al., 2015) . amplicons that were not sequenced to depth of 20x were not included in the final consensus genome, and these positions are represented by stretches of nnn's (data s1). to investigate the origin and diversity of sars-cov-2 in connecticut, we compiled a dataset of our nine genomes with another 159 representative sample of sars-cov-2 genomes that were available from genbank (https://www.ncbi.nlm.nih.gov/genbank/ sars-cov-2-seqs/) and gisaid (https://www.gisaid.org/). see data s2 for a list of sequences and acknowledgments to the originating and submitting labs. no data that was only released on gisaid was used without consent from the authors (see acknowledgments). we aligned consensus genomes using the augur toolkit version 6.4.2 (hadfield et al., 2018) . specifically, we aligned sequences using mafft (katoh et al., 2002) , masked sites at the 5 0 and 3 0 ends of the alignment as well as a small number of sites that likely vary due to assembly artifacts (see https://github.com/nextstrain/ncov), and reconstructed a phylogeny using iq-tree (nguyen et al., 2015) . these trees are further processed using augur and treetime to add ancestral reconstructions . the tree is rooted on the ancestor of the two genomes ''wuhan-hu-1/2019'' and ''wuhan/wh01/2019.'' sequences in this sample differ from the root by 10 or fewer nucleotide substitutions. bootstrap values are not a meaningful measure of branch support in this case. here, many of the branches are supported by one substitution, which would correspond to a bootstrap support of 0.63. for a branch supported by two substitutions the bootstrap support value would correspondingly be 0.86. given this approximate one-to-one mapping between bootstrap values and the number of substitutions, we directly show mutations supporting the major splits in the tree as it is more informative. the substitutions defining these clades are compatible with the tree topology and are not homoplastic. the probability that all clade defining substitutions arose multiple times independently in a manner compatible with the tree topology is vanishingly small. for example, with a rate of 2 nucleotide substitutions per month in a genome of length approximately 29'000 bases, the probability of this happening for any pair of six sister clades within a 2 month time frame is < 0.01. a root-to-tip plot can be found in figure s1 . the data can be visualized at: https://nextstrain.org/community/grubaughlab/ct-sars-cov-2/paper1. daily covid-19 cases from international locations were obtained from the european centre for disease prevention and control via our world in data (https://ourworldindata.org/coronavirus-source-data). international data were accessed on march 19, 2020. daily covid-19 cases from connecticut and other u.s. locations (washington, california, florida, illinois, and louisiana) were obtained from the repository (https://github.com/cssegisanddata/covid-19) hosted by the center for systems science and engineering (csse) at johns hopkins university (dong et al., 2020) . these represent the international and out-of-region domestic (i.e., excluding new york, massachusetts, and new jersey) locations with the most reported covid-19 cases. to investigate the domestic and international spread of sars-cov-2, we obtained air passenger volumes from the international air transport association (iata; http://www.iata.org/). iata data consist of global ticket sales, which account for true origins and final destinations, and represents 90% of all commercial flights. we obtained the monthly number of passengers traveling by air from five international (china, italy, iran, spain, and germany) and five u.s. locations (washington, california, florida, illinois, and louisiana) to airports that are commonly used by connecticut residents: bradley international airport (bdl, hartford, connecticut; ranked 53rd in u.s. in yearly passenger volume; https://www.faa.gov/airports/planning_capacity/passenger_allcargo_stats/passenger/), general edward lawrence logan international airport (bos, boston, massachusetts; ranked 16th), and john f. kennedy international airport (jfk, new york, new york; ranked 6th). air passenger data from 2020 is not currently available; thus, we used data from january to march 2019 to represent general trends in passenger volumes, as done previously (bogoch et al., 2020) . we took the average of the 3-month passenger volumes to estimate the daily number of travelers entering each airport from the specified origin. to account for passenger reductions following u.s. government alerts and restrictions (taylor, 2020) , we modeled two scenarios: a 40% reduction in passenger volume and a 90% reduction in passenger volume. these thresholds were determined based on previously reported estimates and assumptions around travel restrictions (chinazzi et al., 2020) . we estimated the true number of incident cases per day by adjusting the number of reported incident cases to reflect the ascertainment period and reporting rate using: where c t is the number of reported incident cases of covid-19 on day t, d is the number of days from symptom onset to testing, and r is the reporting rate. we assumed a constant ascertainment period of d = 5 days between symptom onset and testing (ferguson et al., 2020) . because of the evidence of pre-symptomatic transmission (tindale et al., 2020) , we also assumed that cases become infectious one day before symptom onset. to account for substantial uncertainty around reporting rates, we assigned different reporting rates to ll e3 cell 181, 1-7.e1-e4, may 28, 2020 please cite this article in press as: fauver et al., coast-to-coast spread of sars-cov-2 during the early epidemic in the united states, cell (2020), https://doi.org/10. 1016/j.cell.2020.04.021 article individual locations based on the testing criteria enacted in that location (niehus et al., 2020) . for each country and state, we first extracted testing criteria from the department or ministry of health website. we assumed that countries or states with similar testing criteria policies captured similar proportions of true infections. using the respective testing criteria, we categorized countries or states as having narrow, moderate, or broad testing levels. we then assigned reporting rates to each testing level by using the mean and 95% confidence interval of the reporting rate estimated by nishiura et al. (2020) : 0.092 (95% ci = 0.05-0.20). the reporting rate for the broadest testing level, r = 0.20, also corresponded to the reporting rate in mainland china (chinazzi et al., 2020) . we thus assigned iran, florida, washington, and illinois to a ''narrow'' testing level (r = 0.05); spain, italy, and louisiana to a ''moderate'' testing level (r=0.092); and china, germany, and california to a ''broad'' testing level (r = 0.20; data s2, ''testing-criteria''). to estimate the number of prevalent infectious individuals on day t (p t ), we multiplied the number of incident infections up to day t by the probability that an individual who became infectious on day i was still infectious on day t: where g(t-i) is the cumulative distribution function of the infectious period. we modeled the infectious period as gamma distribution with mean 7 days and standard deviation 4.5 days which aligns with other modeling studies (prem et al., 2020; zhao et al., 2020) . we assumed that cases would not travel once they were diagnosed and therefore removed them from our estimate of infectious travelers (t t ): ã°i i ã� c i + d + 1 ã�ã°1 ã� gã°t ã� iã�ã� + x tã�1 i = tã�4 i i ã°1 ã� gã°t ã� iã�ã� + i t eq. 3 the first term of equation 3 accounts for the assumption that some cases had been diagnosed by day t and thus would not travel. the second and third terms capture cases who are infectious on day t and have not yet been diagnosed. we calculated daily risk of importation as a function of the population-adjusted density of infectious travelers and passenger volume: where t t is the number of infectious travelers on day t, pop a is the population of location a, and n t is the number of passengers traveling from each location to southern new england on day t. we summed the calculated risk across the three airports (bdl, bos, jfk) and then across domestic and international travelers to arrive at our final estimates. the maps presented in our figures were generated using shape files from natural earth (http://www.naturalearthdata.com/). the basemaps are open source and freely available to anyone. statistical analyses were performed using r version 3.5.2 (r core team, 2017) and are described in the figure legends and in the method details. cell 181, 1-7.e1-e4, may 28, 2020 e4 please cite this article in press as: fauver et al., coast-to-coast spread of sars-cov-2 during the early epidemic in the united states, cell (2020), https://doi.org/10.1016/j.cell.2020.04.021 figure s1 . root-to-tip plot showing the evolutionary rate of the sars-cov-2 genomes in our dataset, related to figure 1 ll article potential for global spread of a novel coronavirus from evaluating and reporting persons under investigation (pui) coronavirus disease 2019 (covid-19) cdc 2019-novel coronavirus (2019-ncov) (real-time rt-pcr diagnostic panel) the effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak an interactive web-based dashboard to track covid-19 in real time impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand tracking virus outbreaks in the twenty-first century severe acute respiratory syndrome-related coronavirus: the species and its viruses -a statement of the coronavirus study group an amplicon-based sequencing framework for accurately measuring intrahost virus diversity using primalseq and ivar nextstrain: real-time tracking of pathogen evolution mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform the effect of human mobility and control measures on the covid-19 epidemic in china. science a complete bacterial genome assembled de novo using only nanopore sequencing data the cdc has changed its criteria for testing patients for coronavirus after the first case of unknown origin was confirmed. cnn iq-tree: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies quantifying bias of covid-19 prevalence and severity estimates in wuhan, china that depend on reported cases in international travelers the rate of underascertainment of novel coronavirus (2019-ncov) infection: estimation using japanese passengers data on evacuation flights estimating unobserved sars-cov-2 infections in the united states the effect of control strategies that reduce social mixing on outcomes of the covid-19 epidemic in wuhan ncov-2019 sequencing protocol v1 multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples r: a language and environment for statistical computing (r foundation for statistical computing) treetime: maximum-likelihood phylodynamic analysis a timeline of the coronavirus pandemic. the new york times transmission interval estimates suggest pre-symptomatic spread of covid-19 novel coronavirus (2019-ncov) situation report-1 coronavirus disease 2019 (covid-19) situation report-63 a new coronavirus associated with human respiratory disease in china a mathematical model for estimating the age-specific transmissibility of a novel coronavirus a pneumonia outbreak associated with a new coronavirus of probable bat origin article star+methods key resources table resource availability lead contact further information and requests for data, resources, and reagents should be directed to and will be fulfilled by the lead contact, nathan d. grubaugh (nathan.grubaugh@yale.edu key: cord-302020-ypsh3rjv authors: kim, dongwan; lee, joo-yeon; yang, jeong-sun; kim, jun won; kim, v. narry; chang, hyeshik title: the architecture of sars-cov-2 transcriptome date: 2020-04-23 journal: cell doi: 10.1016/j.cell.2020.04.011 sha: doc_id: 302020 cord_uid: ypsh3rjv sars-cov-2 is a betacoronavirus responsible for the covid-19 pandemic. although the sars-cov-2 genome was reported recently, its transcriptomic architecture is unknown. utilizing two complementary sequencing techniques, we present a high-resolution map of the sars-cov-2 transcriptome and epitranscriptome. dna nanoball sequencing shows that the transcriptome is highly complex owing to numerous discontinuous transcription events. in addition to the canonical genomic and 9 subgenomic rnas, sars-cov-2 produces transcripts encoding unknown orfs with fusion, deletion, and/or frameshift. using nanopore direct rna sequencing, we further find at least 41 rna modification sites on viral transcripts, with the most frequent motif, aagaa. modified rnas have shorter poly(a) tails than unmodified rnas, suggesting a link between the modification and the 3′ tail. functional investigation of the unknown transcripts and rna modifications discovered in this study will open new directions to our understanding of the life cycle and pathogenicity of sars-cov-2. coronavirus disease 19 is caused by a novel coronavirus designated as severe acute respiratory syndrome coronavirus 2 (sars-cov-2) (zhou et al., 2020; zhu et al., 2020) . like other coronaviruses (order nidovirales, family coronaviridae, subfamily coronavirinae), sars-cov-2 is an enveloped virus with a positive-sense, single-stranded rna genome of $30 kb. sars-cov-2 belongs to the genus betacoronavirus, together with sars-cov and middle east respiratory syndrome coronavirus (mers-cov) (with 80% and 50% homology, respectively) (kim et al., 2020; zhou et al., 2020) . coronaviruses (covs) were thought to primarily cause enzootic infections in birds and mammals. however, the recurring outbreaks of sars, mers, and now covid-19 have clearly demonstrated the remarkable ability of covs to cross species barriers and transmit between humans (menachery et al., 2017) . covs carry the largest genomes (26-32 kb) among all rna virus families (figure 1 ). each viral transcript has a 5 0 -cap structure and a 3 0 poly(a) tail (lai and stohlman, 1981; yogo et al., 1977) . upon cell entry, the genomic rna is translated to produce nonstructural proteins (nsps) from two open reading frames (orfs), orf1a and orf1b. the orf1a produces polypeptide 1a (pp1a, 440-500 kda) that is cleaved into 11 nsps. the à1 ribosome frameshift occurs immediately upstream of the orf1a stop codon, which allows continued translation of orf1b, yielding a large polypeptide (pp1ab, 740-810 kda) which is cleaved into 15 nsps. the proteolytic cleavage is mediated by viral proteases nsp3 and nsp5 that harbor a papain-like protease domain and a 3c-like protease domain, respectively. the viral genome is also used as the template for replication and transcription, which is mediated by nsp12 harboring rnadependent rna polymerase (rdrp) activity (snijder et al., 2016; sola et al., 2015) . negative-sense rna intermediates are generated to serve as the templates for the synthesis of positive-sense genomic rna (grna) and subgenomic rnas (sgrnas). the grna is packaged by the structural proteins to assemble progeny virions. shorter sgrnas encode conserved structural proteins (spike protein [s] , envelope protein [e], membrane protein [m] , and nucleocapsid protein [n] ), and several accessory proteins. sars-cov-2 is known to have at least six accessory proteins (3a, 6, 7a, 7b, 8, and 10) according to the current annotation (genbank: nc_045512.2). however, the orfs have not yet been experimentally verified for expression. therefore, it is currently unclear which accessory genes are actually expressed from this compact genome. each coronaviral rna contains the common 5 0 ''leader'' sequence of $70 nt fused to the ''body'' sequence from the downstream part of the genome (lai and stohlman, 1981; sola et al., 2015) (figure 1 ). according to the prevailing model, leader-to-body fusion occurs during negative-strand synthesis at short motifs called transcription-regulatory sequences (trss) that are located immediately adjacent to orfs (figure 1 ). trss contain a conserved 6-7 nt core sequence (cs) surrounded by variable sequences. during negative-strand synthesis, rdrp pauses when it crosses a trs in the body (trs-b) and switches the template to the trs in the leader (trs-l), which results in discontinuous transcription leading to the leader-body fusion. from the fused negative-strand intermediates, positive-strand mrnas are transcribed. the replication and transcription mechanism has been studied in other coronaviruses. however, it is unclear whether the general mechanism also applies to sars-cov-2 and if there are any unknown components in the sars-cov-2 transcriptome. for the development of diagnostic and therapeutic tools and the understanding of this new virus, it is critical to define the organization of the sars-cov-2 genome. deep sequencing technologies offer powerful means to investigate viral transcriptome. the ''sequencing-by-synthesis (sbs)'' methods such as the illumina and mgi platforms confer high accuracy and coverage. however, they are limited by short read length (200-400 nt) , so the fragmented sequences should be re-assembled computationally, during which the haplotype information is lost. more recently introduced is the nanoporebased direct rna sequencing (drs) approach. although nanopore drs is limited in sequencing accuracy, it enables longread sequencing, which would be particularly useful for the analysis of long nested cov transcripts. moreover, because drs detects rna instead of cdna, the rna modification information can be obtained directly during sequencing (garalde et al., 2018) . numerous rna modifications have been found to control eukaryotic rnas and viral rnas (williams et al., 2019) . terminal rna modifications such as rna tailing also play a critical role in cellular and viral rna regulation (warkocki et al., 2018) . in this study, we combined two complementary sequencing approaches, drs and sbs. we unambiguously mapped the sgrnas, orfs, and trss of sars-cov-2. additionally, we found numerous unconventional rna joining events that are from the full-length genomic rna (29,903 nt) that also serves as an mrna, orf1a and orf1b are translated. in addition to the genomic rna, nine major subgenomic rnas are produced. the sizes of the boxes representing small accessory proteins are bigger than the actual size of the orf for better visualization. the black box indicates the leader sequence. note that our data show no evidence for orf10 expression. (a) read counts from nanopore direct rna sequencing of total rna from vero cells infected with sars-cov-2. ''leader+'' indicates the viral reads that contain the 5 0 end leader sequence. ''no leader'' denotes the viral reads lacking the leader sequence. ''nuclear'' reads match mrnas from the nuclear chromosome while ''mitochondrial'' reads are derived from the mitochondrial genome. ''control'' indicates quality control rna for nanopore sequencing. (b) genome coverage of the nanopore direct rna sequencing data shown in (a). the stepwise reduction in coverage corresponds to the borders expected for the canonical sgrnas. the smaller inner plot magnifies the 5 0 part of the genome. (c) read counts from dna nanoball sequencing using mgiseq. total rna from vero cells infected with sars-cov-2 was used for sequencing. (d) genome coverage of the dna nanoball sequencing (dnb-seq) data shown in (c). see also figure s1 . distinct from canonical trs-mediated polymerase jumping. we further discovered rna modification sites and measured the poly(a) tail length of grnas and sgrnas. to delineate the sars-cov-2 transcriptome, we first performed drs runs on a minion nanopore sequencer with total rna extracted from vero cells infected with sars-cov-2 (be-tacov/korea/kcdc03/2020). the virus was isolated from a patient who was diagnosed with covid-19 on january 26, 2020, after traveling from wuhan, china (kim et al., 2020) . we obtained 879,679 reads from infected cells (corresponding to a throughput of 1.9 gb) ( figure 2a ). the majority (65.4%) of the reads mapped to sars-cov-2, indicating that viral transcripts dominate the transcriptome while the host gene expression is strongly suppressed. although nanopore drs has the 3 0 bias due to directional sequencing from the 3 0 ends of rnas, approximately half of the viral reads still contained the 5 0 leader. the sars-cov-2 genome was almost fully covered, missing only 12 nt from the 5 0 end due to the known inability of drs to sequence the terminal $12 nt ( figure 2b ). the longest tags (111 reads) correspond to the full-length grna ( figure 2b ). the coverage of the 3 0 side of the viral genome is substantially higher than that of the 5 0 side, which reflects the nested sgrnas. (a) frequency of discontinuous mappings in the long reads from the dnb-seq data. the color indicates the number of reads with large gaps spanning between two genomic positions (starting from a coordinate in the x axis and ending in a coordinate in the y axis). the counts were aggregated into 100-nt bins for both axes. the red asterisk on the x axis indicates the column containing the leader trs. please note that the leftmost column was expanded horizontally on this heatmap to improve visualization. the red dots on the sub-plot alongside the y axis denote local peaks which coincide with the 5 0 end of the body of each sgrna. (b) transcript abundance was estimated by counting the dnbseq reads that span the junction of the corresponding rna. (c) top 50 sgrnas. the asterisk indicates an orf beginning at 27,825 that may encode the 7b protein with an n-terminal truncation of 23 amino acids. the gray bars denote minor transcripts that encode proteins with an n-terminal truncation compared with the corresponding overlapping transcript. the black bars indicate minor transcripts that encode proteins in a different reading frame from the overlapping major mrna. this is also partly due to the 3 0 bias of the directional drs technique. the common presence of the leader sequence (72 nt) in viral rnas results in a prominent coverage peak at the 5 0 end, as expected. we could also clearly detect vertical drops in the coverage, whose positions correspond to the leaderbody junction in sgrnas. all known sgrnas are supported by drs reads, with an exception of orf10 (see below). in addition, we observed unexpected reads reflecting noncanonical ''splicing'' events ( figure s1 ). such fusion transcripts resulted in the increased coverage toward the 5 0 end ( figure 2b , inset). early studies on coronavirus mouse hepatitis virus reported that recombination frequently occurs (furuya and lai, 1993; liao and lai, 1992; luytjes et al., 1996) . some viral rnas contain the 5 0 and 3 0 proximal sequences resulting from ''illegitimate'' polymerase jumping. to further validate sgrnas and their junction sites, we performed dna nanoball sequencing (dnb-seq) based on the sequencing-by-synthesis principle and obtained 305,065,029 reads with an average insert length of 220 nt ( figure 2c ). the results are overall consistent with the drs data. the leader-body junctions are frequently sequenced, giving rise to a sharp peak at the 5 0 end in the coverage plot ( figure 2d ). the 3 0 end exhibits a high coverage as expected for the nested transcripts. the depth of dnb-seq allowed us to confirm and examine the junctions on an unprecedented scale for a cov genome. we mapped the 5 0 and 3 0 breakpoints at the junctions and estimated the fusion frequency by counting the reads spanning the junctions ( figure 3a ). the leader represents the most prominent 5 0 site, as expected ( figure 3a , red asterisk on the x axis). the known trs-bs are detected as the top 3 0 sites (figure 3a, red dots on the y axis). these results confirm that sars-cov-2 uses the canonical trs-mediated template-switching mechanism for discontinuous transcription to produce major sgrnas ( figure 3b ). quantitative comparison of the junctionspanning reads shows that the n rna is the most abundantly expressed transcript, followed by s, 7a, 3a, 8, m, e, 6, and 7b ( figure 3c ). it is important to note that orf10 is represented by only one read in dnb data (0.000009% of viral junction-spanning reads) and that it was not supported at all by drs data. orf10 does not show significant homology to known proteins. thus, orf10 is unlikely to be expressed. the annotation of orf10 should be reconsidered. taken together, sars-cov-2 expresses nine canonical sgrnas (s, 3a, e, m, 6, 7a, 7b, 8, and n) together with the grna (figures 1 and 3c ). in addition to the canonical sgrnas with expected structure and length ( figure 3d ), our results show many minor junction sites (figures 3e-3g; table s2 ). there are three main types of such fusion events. the rnas in the first group have the leader combined with the body at unexpected 3 0 sites in the middle of orfs or utr ( figure 3e , trs-l-dependent noncanonical; table s3 ). the second group shows a long-distance fusion between sequences that do not have similarity to the leader ( figure 3f , trs-l-independent distant). the last group undergoes local fusion, which leads to smaller deletions, mainly in structural and accessory genes, including the s orf (figure 3g, trs-l-independent local recombination). these fusion transcripts were also found in drs data ( figure s2 ). we verified the expression of some of these transcripts by rt-pcr ( figure s3 ). of note, the junctions in these noncanonical transcripts are not derived from a known trs-b. some junctions show short sequences (3-4 nt) common between the 5 0 and 3 0 sites, suggesting a partial complementarity-guided template switching (''polymerase jumping''). however, the majority do not have any obvious sequences. thus, we cannot exclude a possibility that at least some of these transcripts are generated through a different mechanism(s). it was previously shown in other coronaviruses that transcripts with partial sequences are produced (furuya and lai, 1993; liao and lai, 1992; luytjes et al., 1996) . recent sequencing analyses also revealed non-canonical sgrnas from mouse hepatitis virus (genus betacoronavirus, subfamily coronavirinae) (irigoyen et al., 2016) , hcov-229e (genus alphacoronavirus, subfamily coronavirinae) (viehweger et al., 2019) , and equine torovirus (genus torovirus, subfamily torovirinae, family coronaviridae) (stewart et al., 2018) , suggesting this mechanism may be at least partially conserved in coronaviridae. functionality of sgrnas are not clear, and some of them have been considered as parasites that compete for viral proteins, hence referred to as ''defective interfering rnas'' (di-rnas) (pathak and nagy, 2009) . although the noncanonical transcripts may arise from erroneous replicase activity, it remains an open question if the fusion has an active role in viral life cycle and evolution. although individual rna species are not abundant, the combined read numbers are often comparable to the levels of accessory transcripts. most of the rnas have coding potential to yield proteins. transcripts that belong to the ''trs-l-independent distant'' group encode the upstream part of orf1a, including nsp1, truncated nsp2, and/or truncated nsp3, whose summed abundance is $20% of grna. depending on translation efficiency, the protein products may change the stoichiometry between nsps ( figure 3f ; table s4 ). another notable example is the 7b protein with an n-terminal truncation that may be produced at a level similar to the annotated full-length 7b ( figure 3c , asterisk). frameshifted or deleted orfs may also generate shorter proteins that are distinct from known viral proteins ( figure 3c ). it will be interesting in the future to examine if these unknown orfs are actually translated and yield functional products. as nanopore drs is based on single-molecule detection of rna, it offers a unique opportunity to examine multiple epitranscriptomic features of individual rna molecules. we recently developed software to measure the length of poly(a) tail from drs data (y. choi and h.c., unpublished data). using this software, we confirm that, like other covs, sars-cov-2 rnas carry poly(a) tails ( figures 4a-4b) . the tail of viral rnas is 47 nt in median length. the full-length grna has a relatively longer tail than sgrnas. notably, sgrnas have two tail populations: a minor peak at $30 nt and a major peak at $45 nt ( figure 4b, arrowheads) . wu et al. (2013) previously observed that the poly(a) tail length of bovine cov mrnas changes during infection: from $45 nt immediately after virus entry to $65 nt at 6-9 hours post-infection and $30 nt at 120-144 hours post-infection. thus, the short tails of $30 nt observed in this study may represent aged rnas that are prone to decay. viral rnas exhibit a homogeneous length distribution, unlike host nuclear genome-encoded mrnas ( figure 4c ). the distribution is similar to that of mitochondrial chromosome-encoded rnas whose tail is generated by mtpap (tomecki et al., 2004) . it was recently shown that hcov-229e nsp8 has an adenylyltransferase activity, which may extend poly(a) tail of viral rna (tvarogová et al., 2019) . because poly(a) tail should be constantly attacked by host deadenylases, the regulation of viral rna tailing is likely to be important for the maintenance of genome integrity. poly(a) tail of mrna is also generally critical for stability control and translation through its interaction with poly(a) binding proteins (pabps). cytoplasmic pabps facilitate deadenylation by the ccr4-not complex while blocking untimely decay by exosome and uridylation machinery. pabps also interact with translation initiation factors to allow translation. thus, the viral tail is likely to play multiple roles for translation, decay, and replication. next, we examined the epitranscriptomic landscape of sars-cov-2 by using the drs data. viral rna modification was first described more than 40 years ago (gokhale and horner, 2017) . n6-methyladenosine (m6a) is the most widely observed modification (courtney et al., 2017; gokhale et al., 2016; krug et al., 1976; lichinchi et al., 2016; narayan et al., 1987) , but other modifications have also been reported on viral rnas, including 5-methylcytosine methylation (5mc), 2 0 -o-methylation (nm), deamination, and terminal uridylation. in a recent analysis of hcov-229e using drs, modification calling suggested frequent 5mc signal across viral rnas (viehweger et al., 2019) . however, because no direct control group was included in the analysis, the proposed modification needed validation. to unambiguously investigate the modifications, we generated negative control rnas by in vitro transcription of the viral sequences and performed a drs run on these unmodified controls ( figure s4a ). the partially overlapping control rnas are $2.3 kb or $4.4 kb each and cover the entire length of the genome ( figure s4b ). detection using pre-trained models reported numerous signal level changes corresponding to 5mc modification, even with the unmodified controls ( figure s4c ). we obtained highly comparable results from the viral rnas from infected cells ( figure s4d) . thus, the 5mc sites detected without a control are likely to be false positives. we, however, noticed intriguing differences in the ionic current (called ''squiggles'') between negative control and viral transcripts ( figure 5a ). at least 41 sites displayed substantial differences (over 20% frequency), indicating potential rna modifications (table s5) . notably, some of the sites showed different frequencies depending on the sgrna species. figure 5d , right) is longer than that of the unmodified base (figure 5d, left) , suggesting that the modification interferes with the passing of rna molecules through the pore. among the 41 potential modification sites, the most frequently observed motif is aagaa (figures 6a and 6b) . the modification sites on the ''aagaa-like'' motif (including aagaa and other a/g-rich sequences) are found throughout the viral genome but particularly enriched in genomic positions 28,500-29,500 ( figure 6c ). long viral transcripts (grna, s, 3a, e, and m) are more frequently modified than shorter rnas (6, 7a, 7b, 8, and n) (figure 6d ), suggesting a modification mechanism that is specific for certain rna species. because drs allows simultaneous detection of multiple features on individual molecules, we cross-examined the poly(a) tail length and internal modification sites. interestingly, modified rna molecules have shorter poly(a) tails than unmodified ones (figures 6e and s6; p < 9.8 3 10 à5 and p < 7.3 3 10 à12 for orf1ab and s, respectively; mann-whitney u test). these results suggest a link between the internal modification and 3 0 end tail. because poly(a) tail plays an important role in rna turnover, it is tempting to speculate that the observed internal modification is involved in viral rna stability control. it is also plausible that rna modification is a mechanism to evade host immune response. the type of modification(s) is yet to be identified, although we can exclude mettl3-mediated m6a (for lack of consensus motif rrach), adar-mediated deamination (for lack of a-to-g sequence change in the dnbseq data), and m1a (for lack of the evidence for rt stop). our finding implicates a hidden layer of cov regulation. it will be interesting in the future to identify the chemical nature, enzymology, and biological functions of the modification(s). in this study, we delineate the transcriptomic and epitranscriptomic architecture of sars-cov-2. unambiguous mapping of the expressed sgrnas and orfs is a prerequisite for the functional investigation of viral proteins, replication mechanism, and host-viral interactions involved in pathogenicity. an in-depth analysis of the joint reads revealed a highly complex landscape of viral rna synthesis. like other rna viruses, covs undergo frequent recombination, which may allow rapid evolution to change their host/tissue specificity and drug sensitivity. the frequent fusion detected in this study may provide a basis for variant generation and need to be investigated in detail. the new orfs may serve as accessory proteins that modulate viral replication and host immune response. the rna modifications may also contribute to viral survival and immune evasion in infected tissues as the innate immune system is known to be less sensitive to rnas with nucleoside modification (karikó et al., 2005) . these new molecular features will need to be studied further in animal tissues and cell types that have an intact interferon system. it is also yet to be examined if the orfs and rna modifications are unique to sars-cov-2 or conserved in other coronaviruses. comparative studies on their distribution and functional significance will help us to gain a deeper understanding of sars-cov-2 and coronaviruses in general. our data provide a rich resource and open new directions to investigate the mechanisms underlying the pathogenicity of sars-cov-2. see also figure s6 and table s5 . detailed methods are provided in the online version of this paper and include the following: figures 6a-6d ) b statistical analysis of modified bases by alternative model (figures s4c and s4d ) b poly(a) length analysis depending on modification rates ( figures 6e and s6 ) b visualization of sequence alignment ( figure s1 ) supplemental information can be found online at https://doi.org/10.1016/j. cell.2020.04.011. further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, v. narry kim (narrykim@snu.ac.kr) . this study did not generate new unique reagents. the source code for the data processing and analyses is available at https://github.com/hyeshik/sars-cov-2-transcriptome. the sequencing data were deposited into the open science framework (osf) with an accession number https://doi.org/10. 17605/osf.io/8f6n9. the processed sequencing data can be accessed from the ucsc genome browser covid-19 pandemic resources at https://genome.ucsc.edu/covid19.html. the original data for figure s3 were deposited into mendeley data: https://doi. org/10.17632/bkhbpvtg7h.1. sars-cov-2 viral rna was prepared by extracting total rna from vero cells (atcc, ccl-81) infected with betacov/korea/ kcdc03/2020 (kim et al., 2020) , at a multiplicity of infection (moi) of 0.05, and cultured in dmem (gibco) supplemented with 2% fetal bovine serum (gibco) and penicillin-streptomycin (gibco) at 37 c, 5% co2. the virus is the fourth passage and not plaque-isolated. cells were harvested at 24 hours post-infection. this study was carried out in accordance with the biosafety guideline by the kcdc. the institutional biosafety committee of seoul national university approved the protocol used in these studies (snuibc-200219-10) . cultured cells were washed once with pbs before adding trizol (invitrogen). purified total rnas from non-infected and sars-cov-2-infected vero cells were treated with dnasei (takara) followed by column purification (rneasy minelute cleanup kit [qiagen]) and used for the experiments. table s1 . table s1 . nanopore direct rna sequencing for nanopore sequencing on non-infected and sars-cov-2-infected vero cells, each 4 mg of dnasei (takara)-treated total rna in 8 ml was used for library preparation following the manufacturer's instruction (the oxford nanopore drs protocol, sqk-rna002) with minor adaptations. 20 u of superase-in rnase inhibitor (ambion, 20 u/ml) was added to both adaptor ligation steps. superscript iv reverse transcriptase (invitrogen) was adopted instead of superscript iii, and the reaction time of reverse transcription was lengthened by 2 hours. the library was loaded on flo-min106d flow cell followed by 42 hours sequencing run on minion device (oxford nanopore technologies). for nanopore sequencing on sars-cov-2 rna fragments produced by in vitro transcription, the same method was applied except for the rna amount (a total 2 mg of in vitro transcribed rnas) and reaction time for reverse transcription (30 minutes). the nanopore direct sequencing data were basecalled by guppy 3.4.5 (oxford nanopore technologies) using the highaccuracy model. the sequence reads were aligned to the reference sequence database composed of the c. sabaeus genome (ensembl release 99), a sars-cov-2 genome, yeast eno2 cdna (sgd: yhr174w), and human ribosomal dna complete repeat unit (genbank: u13369.1) using minimap2 2.17 (li, 2018) with options ''-k 13 -x splice -n 32 -un.'' we used the sequence of the wuhan-hu-1 strain (genbank: nc_045512.2) as a backbone for the viral reference genome, then corrected the four single nucleotide variants found in betacov/korea/kcdc03/2020; t4402c, g5062t, c8782t, and t28143c (gisaid: epi_isl_407193). the sequence alignments were further improved by re-mapping the identified viral reads to the viral genome using minimap2 options ''-k 8 -w 1-splice -g 30000 -g 30000 -a1 -b2 -o2,24 -e1,0 -c0 -z 400,200-no-end-flt-junc-bonus=100 -f 40000 -n 32-splice-flank=nomax-chain-skip=40 -un-junc-bed=file -p 0.7.'' chimeric reads were filtered out according to the flag from minimap2. with 1 mg of total rna from sars-cov-2-infected vero cell, dynabeads mrna purification kit (invitrogen) was applied to deplete rrna and enrich poly(a) + rna by using oligo d(t). rna-seq library for 250 bp insert size was constructed following the manufacturer's instruction (mgieasy rna directional library prep set). the library was loaded on mgiseq-200rs sequencing flow cell with mgiseq-200rs high-throughput sequencing kit (pe 100), and the library was run on dnbseq-g50rs (paired-end run, 100 3 100 cycles). the sequences from dnbseq were aligned to the reference sequences used in nanopore drs. we used star 2.7.3a (dobin et al., 2013) with many switches to completely turn off the penalties of non-canonical eukaryotic splicing: ''-outfiltertype bysjout-outfiltermultimapnmax 20-alignsjoverhangmin 8-outsjfilteroverhangmin 12 12 12 12-outsjfiltercountuniquemin 1 1 1 1-out-sjfiltercounttotalmin 1 1 1 1-outsjfilterdisttoothersjmin 0 0 0 0-outfiltermismatchnmax 999-outfiltermismatchnoverreadlmax 0.04-scoregapnoncan à4-scoregapatac à4-chimouttype withinbam hardclip-chimscorejunctionnongtag 0-alignsjstitch-mismatchnmax à1 à1 à1 à1-alignintronmin 20-alignintronmax 1000000-alignmatesgapmax 1000000.'' rna-seq coverage depth plots ( figures 2b and 2d ) sequencing read coverage was calculated using bedtools genomecov of version 2.27.1. the coverage depths were binned to 30-nt (wide views) or 15-nt (insets) bins and plotted by using medians in the plots. heatmaps showing discontinuous mappings ( figures 3a and s2 ) start and end positions of large gaps (r20nt) were collected from the cigar strings of all high-quality (r100 in the star mapping quality) alignments to the viral genome. the positions were counted into 100-nt bins in the zero-based coordinate. the read counts were mapped to a colormap ''viridis'' in matplotlib 3.1.3 after log-transformation with a pseudocount of 1. the detected mostfrequent canonical sites (red dots in the line plots on the left-hand sides) were detected by using signals.find_peaks in the scipy 1.4.1 (prominence = 4 and height = 8 for the drs data; prominence = 8 and height = 13 for the dnbseq data) (virtanen et al., 2020) . counting and classifying reads from subgenomic rnas (figures 3b and 3c) the junction-spanning reads (jsrs) were categorized by the position of 5 0 and 3 0 site positions. a jsr was marked as a leader-tobody junction when the 5 0 site of the deletion is mapped to a genomic position between 55 and 85. in the cases where the 5 0 site is in the 5 0 utr region, the sgrna identity and the frame matching were determined by the first appearance of aug in the downstream of the 3 0 site. in the cases where the 5 0 site is in a known orf or an aug is introduced by fusion, we checked if the concatenated sequence generates a protein product with the same reading frame as a canonical orf after the 3 0 site. for the analyses of sgrna reads using the nanopore drs data, the mapped reads from canonical sgrnas were identified using the start and end positions of large deletions r 10000 nt. for a valid assignment to a species of sgrna, we required that the start position is between 55 and 85 in the genomic coordinate. the first aug in the downstream of the end position of a large deletion was used for identification of the ''spliced'' product. poly(a) length distribution analysis (figures 4, 6e , and s6) the dwell time of poly(a) tails were measured using poreplex 0.5.0 (https://github.com/hyeshik/poreplex). for the conversion from a dwell time to a nucleotide length, we divided a poly(a) dwell measurement by 1/30 of the mode of the poly(a) dwell time of the ont sequencing calibration control which has a 30 nt-long poly(a) tail. balancing ivt product reads and modified base detection by sample level comparison ( figures 5a, 5b , s5a, and s5b) the drs reads of the ivt rnas were downsampled to balance the coverage between different fragments that were split into equalsized patches. sampling frequency of a fragment was controlled by the read counts within a 100-nt bin with the lowest coverage in each fragment. we sampled the reads so that the result contains roughly 10,000 reads from every ivt fragment. the viral rna reads and the downsampled ivt reads were processed for squiggle analyses by ont tombo 1.5 (stoiber et al., 2017) with a minor tune to improve the sensitivity of sequence alignments (-k8 -w1). the modified base detection was done by using the ''model_sample_compare'' mode with an option ''-sample-only-estimates'' unless otherwise specified. the classification of sequence context near the modified sites was first done by the existence of four consecutive purine bases within 5-nt from the position with the highest modification fraction reported by tombo. then, the rest were further divided into four groups according to the nucleotide base with the highest modification fraction. statistical analysis of modified bases by alternative model (figures s4c and s4d ) the candidate sites of 5-methylcytidine were detected using a bundled ''alternative model'' of tombo 1.5. figure s4c shows all positions with at least 500 supporting reads. significantly methylated sites (black dots on top) were selected by applying the 5% false-discovery rate cut-off estimated by viehweger et al. (2019) . figure s4d shows all positions with enough coverage depth (r100 reads) in both ivt products and viral rnas poly(a) length analysis depending on modification rates ( figures 6e and s6 ) ''highly modified'' sgrna reads were detected by referring to eight modification sites which were at least 40% modified in any species of sgrnas: 28591, 28612, 28653, 28860, 28958, 29016, 29088, 29127 . we used the reads that were reported as modified at three or more sites with a statistic < 0.01 as ''highly modified'' reads. ''not modified'' reads were reported with the statistic r 0.01 in all eight sites. statistical tests for shorter poly(a) length of highly modified sgrnas were carried out using wilcox.test() function in r 3.6.1. visualization of sequence alignment ( figure s1 ) to visualize the sequences mapped near cds regions of nsp2-8, the alignments were first selected by the ''intersect'' command of bedtools 2.29.2 for the region 800-12000 (zero-based coordinates). the survived alignments filtered again intersecting with the region 29850-29950 to enrich the 3 0 -intact reads. the resulting alignments further filtered so that we only keep alignments with (1) minimum alignment length of 1000 nt excluding insertions or deletions, (2) minimum contiguously mapped length of 50 nt in the 5 0 -most block to suppress noisy short alignments. 250 randomly chosen alignments passing the criteria were sorted by the 5 0 site position of the largest deletion within each alignment. alignments without a large gap (r100nt) were ordered by the first mapped coordinate. epitranscriptomic enhancement of influenza a virus gene expression and replication star: ultrafast universal rna-seq aligner highly parallel direct rna sequencing on an array of nanopores rna modifications go viral n6-methyladenosine in flaviviridae viral rna genomes regulates infection high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling suppression of rna recognition by toll-like receptors: the impact of nucleoside modification and the evolutionary origin of rna identification of coronavirus isolated from a patient in korea with covid-19 influenza viral mrna contains internal n6-methyladenosine and 5 0 -terminal 7-methylguanosine in cap structures comparative analysis of rna genomes of mouse hepatitis viruses minimap2: pairwise alignment for nucleotide sequences defective interfering rnas: foes of viruses and friends of virologists the nonstructural proteins directing coronavirus rna synthesis and processing continuous and discontinuous rna synthesis in coronaviruses transcriptional and translational landscape of equine torovirus de novo identification of dna modifications enabled by genome-guided nanopore signal processing identification of a novel human nuclear-encoded mitochondrial poly(a) polymerase identification and characterization of a human coronavirus 229e nonstructural protein 8-associated rna 3 0 -terminal adenylyltransferase activity direct rna nanopore sequencing of fulllength coronavirus genomes provides novel insights into structural variants and enables modification analysis scipy 1.0: fundamental algorithms for scientific computing in python terminal nucleotidyl transferases (tents) in mammalian rna metabolism regulation of viral infection by the rna modification n6-methyladenosine regulation of coronaviral poly(a) tail length during infection polyadenylate in the virion rna of mouse hepatitis virus a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china subgenomic rnas with large deletions between nsp2/3 and n regions single read alignment is shown as a set of thick bars and lines connected. thick bars on the alignments indicate contiguous mappings consisting of matches, mismatches, insertions, and small deletions we thank members of our institutions for discussion and help, particularly eunjin chang, inhye park, and young-suk lee at ibs. we are grateful to drs. jung-hye roe, nam-hyuk cho, kwangseog ahn, and jae-hwan nam for their advice and comments. we thank kyung-chang kim and sung soon kim at korea national institute of health for their support. the pathogen resource (nccp43326) for this study was provided by the national culture collection for pathogens, korea national institute of health. this work was supported by the institute for basic science from the ministry of science and ict of korea frequency of discontinuous mappings in the long reads from the nanopore drs data. the color indicates the number of reads with large gaps spanning between two genomic positions (starting from a coordinate in the x axis and ending in a coordinate in the y axis). the counts were aggregated into 100-nt bins for both axes. the red asterisk on the x axis indicates the column containing the leader trs. please note that the leftmost column containing the leader trs was expanded horizontally on this heatmap to improve visualization. the red dots on the sub-plot alongside the y axis denote local peaks which coincide with the 5 0 end of the body of each sgrna. a, read counts from nanopore direct rna sequencing of in vitro transcribed (ivt) rnas that have viral sequences. ''control'' indicates quality control rna for nanopore sequencing. b, the 15 partially overlapping patches cover the entire genome (blue rectangles at the bottom). each rna is~2.3 kb in length. one fragment marked with a green rectangle is longer than others (~4.4 kb) to circumvent difficulties in the pcr amplification. the sequenced reads were downsampled so that every region is equally covered. the resulting balanced coverage is shown in the chart at the top. c, detected 5mc modification from in vitro transcribed unmodified rnas (ivt product) by the ''alternative base detection'' mode in tombo. black dots indicate the sites that satisfy the estimated false discovery rate cut-off calculated using unmodified yeast eno2 mrna (viehweger et al., 2019) . d, comparison between the sites called from unmodified ivt products and those from viral rnas expressed in vero cells. poly(a) tail length distribution of each viral transcript other than shown in figure 6 . key: cord-263468-996kl9jz authors: cattaneo, roberto; schmid, anita; eschle, daniel; baczko, knut; ter meulen, volker; billeter, martin a. title: biased hypermutation and other genetic changes in defective measles viruses in human brain infections date: 1988-10-21 journal: cell doi: 10.1016/0092-8674(88)90048-7 sha: doc_id: 263468 cord_uid: 996kl9jz abstract we assessed the alterations of viral gene expression occurring during persistent infections by cloning full-length transcripts of measles virus (mv) genes from brain autopsies of two subacute sclerosing panencephalitis patients and one measles inclusion body encephalitis (mibe) patient. the suquence of these mv genes revealed that, most likely, almost 2% of the nucleotides were mutated during persistence, and 35% of these differences resulted in amino acid changes. one of these nucleotide substitutions and one deletion resulted in alteration of the reading frames of two fusion genes, as confirmed by in vitro translation of synthetic mrnas. one cluster of mutations was exceptional; in the matrix gene of the mibe case, 50% of the u residues were changed to c, which might result from a highly biased copying event exclusively affecting this gene. we propose that the cluster of mutations in the mibe case, and other combinations of mutations in other cases, favored propagation of mv infections in brain cells by conferring a selective advantage to the mutated genomes. subacute sclerosing panencephalitis (sspe) is among the most thoroughly studied persistent viral infections of the human central nervous system and serves as a model for analysis of the development of persistent viral infections known or suspected to cause several human syndromes, including multiple sclerosis kristensson and norrby, 1986; dowling et al., 1986) . sspe generally develops 5 to 10 years after acute measles, starting with subtle signs of intellectual and psychological dysfunctions, continuing with sensory and motor function deterioration and progressive cerebral degeneration, and leading to death after months or years (ter . the measles inclusion body encephalitis (mibe) clinical and virological manifestations are similar to those of sspe, but the incubation time of mibe can be shorter (roos et al., 1981; ohuchi et al., 1987; . moreover, mibe is found in immunosuppressed patients, whereas sspe patients mount high antibody titers against all measles virus (mv) proteins except matrix (m protein; hall et al., 1979) . m protein is responsible for viral assembly, and it was postulated that silencing of m protein expression could account for lack of viral budding and favor persistence (hall et al., 1979) . indeed initially, m protein could not be detected in brain tissue of sspe patients (hall and choppin, 1981) , but in recent studies using monoclonal antibodies, m protein was found in diseased human brains where the viral envelope proteins fusion (f) and hemagglutinin (h) could not be detected (norrby et al., 1985; baczko et al., 1986) . thus, defective m protein expression might not be the only viral determinant correlating with persistence. the study of the molecular basis for defective mv gene expression in sspe concentrated initially on the cellassociated, defective mvs that can be occasionally obtained by cocultivation of brain cells of sspe patients with stable cell lines (wechsler and fields, 1978; hall et al., 1979; carter et al., 1983; sheppard et al., 1986) . however, the relevance of these observations for human brain infections remains to be established, since these viruses might not be truly representative of the viruses present in infected brains (norrby et al., 1985) . to assess the alterations of viral gene expression characteristic for mv persistence in diseased human brains, it appeared desirable to clone mv genes directly from brain tissue. until now, this has been accomplished only for one m gene in one sspe case, where it turned out that among many other alterations a point mutation introduced a stop codon at position 12 of the m reading frame (cattaneo et al., 1986) . in the present work, using a procedure allowing selective fulllength cdna cloning of mv rnas (schmid et al., 1987) , and starting with only 2.5 pg of polyadenylated brain rna, we achieved cloning of at least one transcript of all the viral genes, except the large polymerase gene, from two sspe cases and one mibe case. from examination of the three sets of nucleocapsid (n), phospho (p), m, f, and h genes, we determined that, on average, the mv genomes recovered from a single brain differ from each other in 20-30 of their 16,000 bases. we also estimated that in all three cases, 200-300 mutations have been fixed during the course of persistence. about 35% of these mutations resulted in amino acid substitutions; in one m and two f genes, reading frames were grossly or slightly changed, respectively. remarkably, in the m gene of the mibe case, but in no other gene, a cluster of transitions converted 50% of the u residues to c. we have previously characterized mv gene expression in brain autopsies of two cases of sspe (a and b) and one case of mibe (c) by immunofluorescence analysis of brain sections with monoclonal antibodies and immunoprecipitation of mv proteins translated in vitro from brain rna (baczko et al., 1986 . the n and p proteins, in-volved in viral replication, were detected in all three cases. the m protein, responsible for viral assembly, was detected only in the brain of case a, the f protein only in case b, and the other envelope protein h only in case c. since expression of the m, f, and h mrnas is diminished in diseased brains (cattaneo et al., 1987b) , it is conceivable that the failure to detect the corresponding proteins could simply be because of low mrna levels. to produce sufficient arnounts of these rnas, we cloned full-length cdnas of these genes from the a, b, and c brains and from the reference mv edmonston strain in an in vitro rna expression vector (experimental procedures). synthetic transcripts were then used to direct protein synthesis in a rabbit reticulocyte lysate. figure 1 is an analysis of the proteins produced from the n, m, f, and h synthetic transcripts of cases a, b, and c, as compared with the proteins produced from the synthetic transcripts of the edmonston (e) strain. the m gene of case c produced only low levels of proteins considerably smaller than the edmonston m protein (about 27 kd, 25 kd, and 17 kd instead of 36 kd). in contrast, the other proteins that had not been detected in brain autopsy materials (the f and h proteins of case a, the m and h proteins of case b, and the f protein of case c) were produced in amounts comparable to the edmonston proteins and had approximately the expected size. this indicated that the reading frames of these genes were intact or only slightly modified. (note that single amino acid substitutions could substantially change the mobility of a protein [noel et al., 19791.) in the case of the f genes, differences in migration were greater than in other genes, amounting to apparently 4 kd between the most rapidly migrating protein ( figure 1 , fusion gene, case b) and the one migrating most slowly ( figure 1 , fusion gene, case c). this is at the upper limit of the differences in migration observed in the proteins of defective sspe viruses (hall et al., 1979) . to ascertain whether the reading frames of these genes were intact, we sequenced the ends of each clone used for expression analysis; in all genes except the m gene of case c (see below) and the f genes of cases a and b (figure 2a) , the signals for initiation and termination of protein synthesis were intact. in the f gene of case a, deletion of one nucleotide at position 2153 resulted in a shift in the reading frame causing substitution of the last 27 amino acids of the edmonston f protein by 11 other residues (figure 2a , nucleotide and amino acid positions as in the convention of richardson et al. [1986] ; this deletion was confirmed in two clones). this explains the higher electrophoretic mobility of this protein as noted above. the apparent molecular weight of the f protein of case b was even lower than that of case a (figure 1 , fusion gene, cases a and b). this was due to the introduction of a new stop codon (uaa) by a c to u mutation at position 2161, resulting in the expression of an f protein shortened by 24 amino acids ( figure 2a) . thus, two of the three f genes examined expressed an f protein with a mutated c terminus. the active f protein of paramyxoviruses is liberated from its inactive precursor by endoproteolytic cleavage after a stretch of basic amino acids ( figure 28 , center), givabout 200 ng of synthetic mv transcripts was translated in a rabbit reticulocyte lysate (promega, madison, wi) in the presence of %slabeled methionine (in vivo cell labeling grade, more than 1000 cilmmol, amersham international, england). equal amounts of the products of these reactions were loaded onto a protein gel (laemmli, 1970) which was soaked in sodium salicylate, dried, and autoradiographed (chamberlain, 1979) . the rnas translated were (from left to right): brome mosaic virus rna (bmv), yielding marker proteins of 110 kd, 97 kd, 65 kd, and 20 kd; no rna (neg.): the transcripts of the genes and cases are indicated on the top. the apparent sizes of the n, m, f, and h protein products of the edmonston strains are the expected ones for proteins translated in the system, 59 kd, 36 kd, 52 kd, and 69 kd, respectively (hasel et al., 1967) . proteins of higher mobility were detected in addition to the full-size n and h proteins, as observed in a previously study (hasel et al., 1967) . ing rise to a unique hydrophobic domain (figure 28, right; varsanyi et al., 1985; richardson et al., 1986; glickman et al., 1988) . it was demonstrated recently for the paramyxoviruses, as well as for influenza a virus and human immunodeficiency virus, that full expression of viral infectivity depends on the efficiency of proteolytic cleavage at this site (webster and rott, 1987; glickman et al., 1988; mccune et al., 1988) . to identify possible sequence alterations causing inefficient f protein cleavage and thus leading to the loss of lytic function typical for mv infections of human brains, we analyzed the region of the f cdnas coding for the cleavage/activation site. however, this site was completely conserved in all three brain-derived f genes ( figure 28 , bottom). to define the mutations introduced during persistence, ideally the sequences of the lytic viruses that infected the three children investigated should be available for comparison. it is however impossible to retrace these viruses, and comparison with the edmonston strain sequence would not be valid since this virus has undergone numerous passages in chick embryos and cultured cells during the process of attenuation (enders et al., 1960 varsanyi et al. (1995) and glickman et al. (1999) . for details see text. 1962). to overcome this limitation as well as possible, we compared our data with a consensus m gene sequence, indicated in figure 3 as pre, which comprises the nucleotides represented most often in nine sequences determined experimentally (three sequences of lytic mvs and six of persistent mvs), as detailed in the legend to figure 3 . we will be calling the deviations from this consensus "mutations:' although this term is not actually accurate. as mentioned above, the m gene of case c was of particular interest because its protein product had an apparent molecular weight considerably lower than expected ( figure 1 , matrix gene, case c). from the sequence analysis presented in figure 3 , it is immediately evident that mutations in this gene are more abundant than in other m genes and that u to c transitions account for a large majority of mutations. in fact, 132 of the 266 u residues encoded in the pre sequence (that is about 50%) are changed to c in the case c sequence. this phenomenon resulted in the alteration of the m protein initiation signal ( figure 3 , positions 37 and 186-188). moreover, in clone pcm1, used for production of the synthetic rnas, a nucleotide deletion at position 756 created a frameshift resulting in the introduction of a stop codon (tag at position 793-795). taken together, these two events should lead to the production of an altered m protein with a molecular mass of about 23 kd. indeed, a major product of 25 kd is detected ( figure 1 , matrix gene, case c). it is also of interest that several minor proteins were produced by the synthetic pcm1 transcripts. this was probably the result of initiation of translation on downstream aug and upstream non-aug codons, characterizing translation of genes possessing a "weak" aug, like the one at position 186-188 of the m gene of case c (kozak et al., 1983; curran and kolakofsky, 1988) . the other four sequenced clones of this m gene did not contain nucleotide deletions, and their major protein products migrated approximately at the position of the m proteins of the other cases (data not shown). this was expected since the loss of 50 amino acids from the amino terminus should be compensated by the gain of 40 at the carboxyl terminus, the gain being due to the substitution of the termination signals at position 1041-1043 by a new signal at position 1161-1163 (both underlined in figure 3 ). given the very high level of u to c transitions detected in the m gene of case c, we predicted that if these mutations had progressively accumulated during persistence, other genes of case c would also have accumulated similar transitions. to test this hypothesis, we sequenced the complete n gene and one third to one half of the f? f, and h genes. we also analyzed the corresponding genes of the a and b cases and compared them with consensus sequences as defined above ( figure 4 indicates the genomic areas sequenced). as shown in table 1 , in the m gene of case c the level of u to c transitions exceeded by a factor of at least 20 all other kinds of mutations, whereas, surprisingly, the levels of u to c transitions in all other genes of case c were comparable to the levels of the other mutations. to further investigate the distributions of u to c mutations in the mv genome, we also sequenced clones covering the whole m and part of the flanking genes (pcm4 and pcm6, legend to figure 3 ). from the graphical representation of these analyses ( figure 4 ) it is evident that, whereas in the n, p and h genes two or fewer transitions have been introduced per group of 20 us, in the m gene between 4 and 16 changes have occurred. interestingly, the switch between high and low levels of u to c transition was abrupt at the p-m gene junction but more gradual at the m-f gene junction; in the first 20 us of the f gene, distributed over not less than 455 nucleotides, five u to c mutations were detected, whereas in the following groups of 20 us, first three and then two or fewer mutations per group were detected. thus, the limits of the genomic regions with high levels of u to c mutations roughly coincide with the limits of the m gene. from table 1 , it is also evident that in the m gene of case c, the level of a to g mutations, corresponding to u to c mutations in the other mv genomic strand, was not enhanced. this indicates that the transitions must have been introduced exclusively in one strand, an event that could arise theoretically either by sequential, strandspecific cycles of localized and biased mutations or, more plausibly, by a single hypermutation event. by sequencing five sibling but not identical clones of the m gene of case c, we also noted that in the few positions that were variable between sibling clones (small letters in sequence c of figure 3 ) we could not detect any u to c mutation, that is 132 of the 132 u to c transitions were conserved in all five m cdnas of case c. we thus conclude that a single event, rather than a continuous, progressive introduction of u to c transitions, must account for the amazingly high level of u replacements. previous studies indicate that during persistence, mutations are continuously introduced and fixed in mv genomes (cattaneo et al., 1986 ). this phenomenon is the t residues in these cdnas correspond to us in the mv transcripts. sequence e is from the edmonston strain sequence h from the street virus hu2 (curran and rima, 1988) , sequence q from the strain cam (this paper, see liebert and ter meulen (19871 for a description of this strain), sequence k from sspe case k (cattaneo et al., 1986) sequence i from sspe cell line ip-3-ca and sequence m from sspe cell line mf (this paper, see cattaneo et al. [1987a] for a description of case mf). the pae sequence is a consensus comprising the nucleotides represented most often in the nine sequences obtained experimentally. the positions differing from the pre sequence are indicated with capital letters (positions diverging in all clones of the same case), or small letters (positions diverging only in some sibling clones). the translation start and stop codons are underlined, as are the mutations leading to amino acid changes. an asterisk in pre indicates a variable position for which no consensus could be defined. position 482 was variable not only within cases but also within sibling clones: it corresponded to g or a in cases i and a and to g or c in case m. a 2 nucleotide deletion at position 1039-1040 in case k is indicated with two deltas. m clones resulting from the elongation of non-m 3' primers hybridizing semispecifically to the gc-rich 3' nontranslated region of the m gene were obtained and completely sequenced: clones pam1, pam2, pam3, pbm1, pbm.2, pbm3, pcm1, pcm2, and pcm3 coded for m genes, respectively, 172, 172, qo. 90, 185396,190, 632 , and 99 nucleotides shorter at their 3' end. clones pcm6, encoded the whole m gene and additional 90 nucleotides of the neighboring f gene, and pcm4 encoded the whole m gene and part of the n and p genes. about 1% of the positions could not be defined because of "strong stops" in the sequencing reactions, and these positions were considered as showing no variation from the pae sequence. the genbank accession number of this sequence is j03175. number of mutations to c in 20 u residues the pre sequence of the m gene is shown in figure 3 ; the other pre sequences were constructed using the a, 8, and c genes and the genes of case ip-3-ca and of the edmonston strain (cattaneo et al., 1966 , and references therein). most likely based on one hand on the low fidelity of rna to assess the variability in strains of lytic viruses, and replication (domingo et al., 1978; for review see steinto estimate the number of mutations introduced during hauer and holland, 1987) and on the other hand on the persistence, we counted the differences of lytic and perlow selective pressure exerted on viral genomes in nonsistent viruses from a consensus as defined above. the lytic infections (holland et al., 1979; rowlands et al., lefthand panel of figure 5 represents the mutations from 1980). in an attempt to quantify the level of internal varithe pre consensus ( figure 3 ) detected in the m coding ability of mv genomes in the human brain, we compared regions of the three lytic viruses, edmonston (e), hu2 (h), the sequences of three overlapping m clones of cases a and cam (q), and of the six persistent viruses, k, i, m, a, and b and of five overlapping m clones of case c. internal b, and c. in genes from lytic infections, 6-9 differences variability was 0.16% for the three m clones of case a (six from the consensus were detected, whereas in genes differences over 3825 comparable nucleotide pairs), 0.18% from sspe persistent infections, 15-25 differences were for case b (six differences over 3361 pairs) and 0.06% for monitored (130 differences in the mibe case c). thus, we case c (seven differences over 11,217 pairs). most of estimated that two to three times more mutations accumuthese changes are probably due to the mv polymerase lated in the m coding regions of viruses implicated in peritself rather than to the reverse transcriptase used for clon-sistent infections. from table 2 , it is clear that this holds ing, since clones obtained with the same technique from true for all the others genes examined, with the exception another rna source differed in less than 0.02% of their of the n gene (legend to table 2 , note "). if we assume positions (k. baczko, unpublished data). it should be that the lytic viruses that initiated the three persistent innoted that the variability of the mv genomes in case fections had accumulated a number of differences from c was lower than in the other two cases, which could be the pre sequence similar to that of the three lytic strains explained by the shorter time of viral persistence in the studied, we can extrapolate that 50-70% of the mutations case of mibe. this result reinforces the suggestion that scored in cases a, b, and c accumulated during the perat the final stage of the mibe infection the mv polymerase sistent phase of these infections (this may be an undereswas at least as precise as in the two sspe infections. a timation, see the end of this section). knowing that in the variability of 0.15% will result in 20-30 differences be-three persistent infections 442 mutations from the contween any two mv genomes with a length of 16,000 sensus sequence have been detected over 17,610 nucleonucleotides, which is a high number even for an rnavirus tides compared (calculated from table 2 , first column), (steinhauer and holland, 1987; cattaneo et al., 1988) . and assuming that 50%-70% of these mutations have been fixed during persistence, we can also extrapolate that 200-300 mutations have been introduced in the mv genome (16,000 nucleotides) during persistence. using the cdnas described here, it should be possible in principle to establish complementation assays to test the effect of single mutations on gene function, and thus to assess if the point mutations introduced during persistence resulted in slight alterations, gross distortions, or disruption of viral protein functions. examination of the mv proteins found in brain cells of different sspe patients has shown examples of restricted expression of the f and h proteins, as well as of the m protein (norrby et al., 1985; baczko et al., 1986) . in contrast, n and p proteins, the two proteins required together with the polymerase for mv transcription and replication, were always detected. the most straightforward explanation for these observations is that the constraints imposed in persistent infections on the m, f, and h genes are relaxed, since they encode viral functions generally presumed to be dispensable for replication (rosenblatt et al., 1979) . if this was the case, we would expect fixation of more mutations causing amino acid changes (replacement site mutations) in the viral envelope protein genes than in the n and p genes. as shown in table 2 (first column), the levels of mutations accumulated in all genes were fairly similar (about 2%), except for the m gene of the mibe case. in the f and h coding regions, respectively, 380/o and 24% of the mutations resulted in amino acid changes, a similar for this computation, two variable positions in the pre sequence (marked with an ' in figure 3) were not considered. a total differences in the coding regions of the a, b: and c cases: n gene, 1.2%; p gene, 1.3%; m gene, 5.5% (1.7% if case c is not considered); f gene, 1.5%; and h gene, 1.4%. b total difference in all the noncoding regions of the three cases: 3.8%. c the sequence of the edmonston n gene diverges from the consensus in about twice as many positions as the sequences of the other edmonston genes. it is remarkable that 18 of the 22 differences were concentrated in the last 700 bases. d in these genes, 90 bases of the 3' noncoding region and 23 of the 5' noncoding region could not be determined. b the relatively low incidence of mutations in the untranslated region of the m gene of case c is due to the scarcity of us. (in fact somewhat lower) percentage to the n and p coding regions (32% and 420/o, calculated from the results presented in table 2 , second column). it should also be noted that, even without considering case c, the m genes had the highest percentage of replacement site mutations, that is 48%. the low number of replacement site mutations found in the h genes was reflected by the identical migration behavior of all h proteins (figure 1 , hemagglutinin genes), and the high number of amino acid changes found in the m genes by the relatively large differences in migration of the m proteins ( figure 1, matrix genes) . these numbers suggest that the selective pressure acting on the genes directly involved in viral replication was not very different from that acting on the viral envelope genes during persistence. the fact that the differences from the consensus sequence were about twice as frequent in the untranslated regions of all the genes compared with the respective coding regions (table 2 , third column and notes a and b, reinforces the suggestion that selective pressures to preserve protein functions remained in effect. on the basis of the sequence comparisons presented in figure 3 , the three lytic viruses fall in a separate subclass from the six persistent viruses. if an m gene consensus sequence is constructed by considering only the lytic viruses, each of the "lytic" m coding regions differs only in 2-4 positions from the "lytic consensus: in contrast with the persistent viruses differing in 19-31 positions (136 differences in case c; right panel of figure 5 ). this suggests that the number of mutations introduced during persistence, which was estimated above to 500/o-70% of the total differences from the pre sequence, might in fact be as high as 80%-900/o. the definition of a "lytic consensus" different from the pre sequence implies that lytic viruses can be distinguished from persistent viruses on the basis of their sequences at characteristic sites. this observation, if confirmed on a larger sample of genes, might have important practical applications: diagnostic differences might be applied for tracing the source of viruses causing measles epidemics or persistent infections. moreover, vaccine strains could be selected on the basis of their sequences, and finally, safer vaccines possessing all genomic characteristics defined as favorable in different strains could be engineered. previously, the occurrence of viral mutations in sspe cases has been documented (cattaneo et al., 1986 . however, it was never clarified whether certain mutations constitute a prerequisite for the development of the disease. mutations might simply be a corollary phenomenon, to be explained by the release of selective pressure exerted on viral genomes that need only replicate and spread from cell to cell but that do not have to provide all the functions necessary for the assembly of infectious virus particles. although the present study still does not directly establish a causal relationship between mutations and disease, two experimental findings presented here strongly support this hypothesis: first, the mechanism of m gene function inactivation by hypermutation in the mibe case; and second, the very extensive and apparently directed drift separating all three persisting mv genomes analyzed from the infecting viruses. defective expression of m protein has been previously revealed in sspe cases (wechsler and fields, 1978; hall et al., 1979; carter et al., 1983) . in particular, both complete absence of m protein (hall and choppin, 1981) or presence of nonfunctional (i.e., unstable) m protein have been reported (sheppard et al., 1986) . in the present study, both of these possibilities were found in the three cases analyzed. in case b, we could monitor the efficient in vitro production of an m protein of approximately correct size from synthetic transcripts of a cdna clone, in spite of the fact that such a protein could not be detected in the brain autopsy of this patient (baczko et al., 1986 ) an observation that can be explained by postulating rapid m protein degradation in vivo. in case c, no m protein could be produced: the proteins synthesized inefficiently in vitro from the synthetic m mrnas had grossly altered termini and dozens of mutated amino acids. the particular interest of this case resides in the mechanism of m inactivation; the analysis of the m and four other genes of this case indicated that a single, biased hypermutation event was most likely responsible for the selective silencing of m gene function. since a lytic virus with intact m function must have been at the origin of the mibe infection, we must conclude that the hypermutation event did not severly affect the efficiency of this genome to replicate. instead, this event must have conferred a selective advantage for the spread of the mutated genome in the brain, because only mutated genomes were detected at death. thus, for case c, our study provides a direct correlation between m function silencing and mutational change, which in this case came about by a probably unique and grossly distorting event. in other words, it seems very likely that the propagation of this lethal infection in the human brain originated from a single genomic clone of mv. nevertheless, m function silencing might not be obligatory for persistence, as suggested by the detection of m proteins in some sspe cases (norrby et al., 1985; baczko et al., 1986) including case a reported here, where an m protein of approximately the correct size was detected both in vitro and in vivo. however, we do not know whether these m proteins are functionally competent. on the other hand, in case a, the carboxyl terminus of the f protein has been structurally altered. a similar alteration of the f protein, mediated by a different mutation, was identified in case b. this indicates that f protein function might be slightly or severely impeded in both cases a and b. in summary, gross alterations have been found so far only in m proteins, less severe modifications in f proteins. it remains to be seen whether all these changes, and/or more subtle changes in these and other viral proteins, might not also contribute to propagation of mv persistent infections in brain cells. the second argument in favor of the view that some mu-tations are instrumental for the development of brain infections is provided by the features of the populations of viral genomes present in brains. in rna virus populations maintained at constant selective pressures, genomic variability is high, but a stable consensus sequence is established that usually changes minimally (domingo et al., 1978; holland et al., 1979) . in contrast, when selective pressures change, viral rna genomes do not maintain the consensus, but rapidly evolve by selection of the fittest (holland et al., 1979; rowlands et al., 1980) . in cases a and b, the populations of viral genomes show an internal variability about ten times lower than the estimated number of changes acquired during persistence (20-30 variable positions versus 200-300 acquired changes). in case c, the internal variability is even lower and the number of acquired changes is of the same order as in the other cases if the changes introduced by the hypermutational event are disregarded. this strongly supports the argument that mutated genomes must have been selected one or more times, conferring a direction to the evolution of the system. thus, viral mutations might indeed favor viral persistence if, instead of compromizing propagation of infection, they promote it in the particular environment of brain cells. obviously, such evolved viral genomes can never become manifest as new viral strains because they are unable to propagate beyond the life span of their host. the fact that sspe and mibe arise so rarely might indicate that combinations of mutations favoring propagation of persistent infections are infrequent. moreover, it may well be that persistent infections can be established only in cases where some host defense mechanisms fail (fujinami and oldstone, 1979; carrigan and kabacoff, 1987) . this is also suggested by the fact that mibe, a complication typical for immunosuppressed individuals, arises more frequently in such patients than sspe in untreated individuals. similar considerations might apply for other viral infections known or suspected to be the cause of several human syndromes (wolinsky and johnson, 1980) . a biased rna polymerase? we are not aware of any other documented case where genetic information involving an entire gene has been distorted so drastically, most likely in a single event. the recently described extensive editing of kinetoplastid mitochondrial transcripts by uridine addition and deletion results in a spectacular modification of the mrnas, but not in alteration of the gene (shaw et al., 1988; feagin et al., 1988) . it must be mentioned, however, that a similar exclusive mutation of one type of nucloetide to another has been described, albeit in a much more restricted region, in the related vesicular stomatitis virus (vsv). in that instance, analogous a to g transitions (14 of 29 positions considered) were detected in a short region (51 nucleotides, intrinsically very rich in a residues) of a defective interfering (di) genome (o'hara et al., 1984 ; note that our u to c mutations, as written in plus strand polarity, might have been introduced in the minus strand genome as a to g transitions). the question remains as to how the mutational cluster in the mv genome of the mibe case could have arisen. in principle, mutations of this kind could be introduced either by chemical mutagens or by imprecise polymerases. the mibe patient had been subjected to a large variety of immunosuppressive and cytostatic drugs, including potential mutagens (roos et al., 1981) . however, the level of mutations observed here is much higher than the level of mutations induced by any chemical mutagen (singer and kusmierek, 1982) . even assuming that a chemical mutagen in a living cell could induce mutations leading to the replacement of 50% of the u residues, it would be very difficult to explain how these mutations could have selectively affected a defined region of a nonsegmented genome. to account for this, homologous recombination of an mv "standard" genome with a hypermutated mv genome or an mrna that coexisted in the same cell would have to be invoked. homologous recombination involving breakage and joining between preexisting strands as with dna is not documented for rna. the apparent recombination events common in positive strand rna viruses probably take place by a copy choice mechanism, in which the viral rna polymerase switches template during rna replication (king et al., 1982; kirkegaard and baltimore, 1986; keck et al., 1988) . in contrast, in negative strand rna viruses, nonhomologous recombination is less common, and homologous recombination has not yet been reported (jennings et al., 1983; o'hara et al., 1984; for review see steinhauer and holland, 1987) . a much more plausible explanation of the observed phenomenon is that one particular part of a genome is synthesized by a biased mv rna polymerase complex nonselectively incorporating u or c residues when copying an a. two prerequisites have to be met in this model: first, biased mv polymerase complexes must occasionally occur in an infected cell; and second, biased and faithful polymerase complexes must act in succession during the synthesis of one progeny rna on one rna template. the polymerase complex of nonsegmented negative strand rna viruses is composed of the large polymerase itself and a phosphoprotein, tightly associated with each other and with the genomic (or antigenomic) ribonucleocapsids, that is rnas enwrapped with nucleocapsid protein molecules (banerjee, 1987) . it has been shown that the polymerases and phosphoproteins are distributed in discrete clusters on cytosolic ribonucleocapsids (portner and murti, 1986; portner et al., 1988) , and it is conceivable that these clusters correspond to polymerase complexes reorganizing during replication (or transcription), possibly by exchanging parts of their components. rna polymerase complexes giving rise to biased errors could arise because they are constituted from genetically altered subunits, because normal subunits assemble in a defective fashion, or because normal rna polymerase complexes can temporarily assume a distorted conformation. evidence for the existence of conformationally "perturbed" rna polymerase complexes, introducing either c or u residues when copying an a after a triggering error, but then returning to the normal fidelity, was obtained from a vw di genome; when rare rna molecules were analyzed in which a particular misincorporation had occurred, it was found that in a position situated two nucleotides downstream of the misincorporated nucleotide, in 20%-50% of the molecules, a c residue was incorporated instead of a u (steinhauer and holland, 1986) . remarkably, all other nucleotides, including more downstream u residues, were incorporated with normal precision. the stabilization of a "perturbed" conformation of the polymerase complex could result in nucleotide transitions in short (o'hare et al., 1984) or longer (case c) genomic stretches of negative strand rna viruses. in an alternative version of this model, the coexistence of normal and genetically altered rna polymerases on a single template, and a relay race of several polymerase complexes during replication is postulated. in this view, the growing end of the replicating mv genome might occasionally be taken over by an entirely new strand elongation complex, or single components of the complex might be exchanged. such events could well constitute an intrinsic property of the polymerase reaction during the transcription mode of ribonucleocapsids, that is during the formation of single mrna molecules from antigenomic templates where a stop-start mechanism of the polymerase at gene junctions has been postulated (for discussion see gupta and kingsbury, 1985) . during replication, mode analogous exchanges might occur, either regularly at gene junctions or occasionally by mistake. it should be possible to ascertain such a patchwise mode of polymerization with in vitro transcription-replication systems. patients patients a and b (patients 1 and 2 in baczko et al., 1986) were 9-and io-year-old children who showed the first sspe symptoms years after primary mv infection and died 3 and 6 months later, respectively. patient c was a 3-year-old child who died of mibe 22 months after the diagnosis of leukemia, 4 months after clinical measles, and 2 months after the first symptoms of neurological disease (roos et al., 1961) . for selective full-length cdna cloning of five mv-specific genes using specific primers (schmid et al., 1987) . 2.5 pg of polyadenylated brain rna prepared as described (cattaneo et al., 198713) were used. clones in pbluescript were identified by colony hybridization and restriction and amplified as described (cattaneo et al., 1968) . a large majority of the n, r f, and h clones were full-length, but some of the m clones were incomplete at their 3' end (legend to figure 3 ). another unexpected finding was that e. coii-containing plasmids in which the h gene was cloned downstream of the lac promoter of pbluescript grew reproducibly slower, reaching lower densities, and yielding only small quantities of plasmid dna. this is most likely due to a deleterious effect of h protein for e. coli; when the h insert was in the "antisense" orientation as compared with the lac promoter, normal growth occurred. in vitro transcription in vitro transcription, in the presence of the cap analog diguanosine triphosphate (g(s)ppp(sjg; pharmacia, uppsala, sweden) and minute amounts of issp]gtp to quantify synthesis, was accomplished with t3 polymerase according to the instructions of the supplier (genofit, geneva, switzerland). in general, about 2 ug of transcripts were obtained from 1 ug of template using 0.5 mm concentrations of ribo-atr ctp, and -ttp, 0.125 mm of ribo-gtp, and 0.625 mm g(5')ppp(5)g. this corresponds approximately to a l-6 molar ratio of template to product. the products were about 90% full-length, as judged by gel electrophoresis. chain termination sequencing of alkali-denatured plasmid dna was performed using deoxyadenosine 5'[%]thiotriphosphate (850 gil mmol, amersham international, england) and t7 dna polymerase (sequenaserm, united states biochemical corporation, cleveland, oh) basically according to the protocol of the supplier. the primers used for sequencing the m and n genes were the same as those used in cattaneo et al. (1988) . since one primer did not hybridize efficiently to the m clones of case c, it was substituted by primer (-) 772-753 (positions as in bellini et al. [1988] ). for sequencing the ends of the p f, and h clones, commercial primers (new england biolabs, beverly, ma) hybridizing with flanking plasmid sequences were used. to sequence over the large f gene 5' untranslated region, the aug, and the f2/fl processing site, two primers (+) 322-338 and (+) 625-640 were used (positions as in richardson et al. [1988] ). expression of defective measles virus genes in brain tissues of patients with subacute sclerosing panencephalitis restriction of measles virus gene expression in measles inclusion body encephalitis the transcription complex of vesicular stomatitis virus matrix genes of measles virus and of canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences identification of a nonproductive, cell-associated form of measles virus by its resistance to inhibition by recombinant human interferon defective translation of measles virus matrix protein in subacute sclerosing panencephalitis accumulated measles virus mutations in a case of subacute sclerosing panencephalitis: interrupted matrix protein reading frame and transcription alteration altered transcription from a defective measles virus genome derived from a diseased human brain altered ratios of measles virus transcripts in diseased human brains multiple viral mutations rather than host factors cause defective measles virus gene expression in a subacute scierosing panencephalitis ceil line fluorographic detection of radioactivity in polyacrylamide gels with the water-soluble fluor, sodium salycilate ribosomal initiation from an acg codon in the sendai virus p/c mrna nucleotide sequence of the gene encoding the matrix protein of a recent measles virus isolate nucleotide sequence heterogeneity of an rna phage population measles virus nucleic acid sequences in human brain studies on an attenuated measles virus vaccine: techniques for assay of effects of vaccination extensive editing of cytochrome c oxidase ill transcript in trypanosoma brucei antiviral antibody reacting on the plasma membrane alters measles virus expression inside the cell quantitative basic residue requirements in the cleavageactivation site of the fusion glycoprotein as a determinant of virulence for newcastle disease virus polytranscripts of sendai virus do not contain intervening polyadenylate sequences measles virus proteins in the brain tissue of patients with subacute sclerosing panencephalitis measles and subacute sclerosing panenecephalitis virus protein: lack of antibodies to the m protein in patients with subacute sclerosing panenecephalitis characterization of cloned measles virus mrnas by in vitro trancription, translation, and immunoprecipitation evolution of multiple genome mutations during long-term persistent infections by vesicular stomatitis virus does the higher order structure of the influenza virus ribonucleoprotein guide sequence rearrangements in influenza viral rna? in vivo rna-rna recombination of coronavirus in mouse brain the mechanism of rna recombination in poliovirus comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles persistence of rna viruses in the central nervous system cleavage of structural proteins during the assembly of the head of bacteriophage t4 virological aspects of measles virus induced encephalomyelitis in lewis and en rats endoproteolytic cleavage of 9~160 is required for the activation of human immunodeficiency virus a single amino acid substitution in a histidine-transport protein drastically alters its mobility in sodium dodecyl sulfate-polyacrylamide gel electrophoresis measles virus matrix protein detected by immune fluorescence with monoctonal antibodies in the brain of patients with subacute sclerosing panencephalitis vesicular stomatitis virus defective interfering particles can contain extensive genomic sequence rearrangements and base substitutions characterization of the measles virus isolated from the brain of a patient with immunosuppressive measles encephalitis localization of p np, and m proteins on sendai virus nucleocapsids using immunogold labeling. virology antibodies against sendai virus l protein: distribution of the protein in nculeocapsids revealed by immunoelectron microscopy the nucleotide sequence of the mrna encoding the fusion protein of measles virus (edmonston strain): a comparison of fusion proteins from several different paramyxoviruses immunologic and virologic studies of measles inclusion body encephalitis in an immunosupressed host: the relationship to subacute sclerosing panencephalitis virus protein changes and rna termini alterations evolving during persistent infection infective substructures of measles virus from acutely and persistently infected cells a procedure for selective full length cdna cloning of specific rna species preliminary tests of a highly attenuated measles vaccine editing of kinetoplastid mitochondrial mrnas by uridine addition and deletion generates conserved amino acid sequences and aug initiation codons rapid degradation restricts measles virus matrix protein expression in a subacute sclerosing panencephalitis cell line chemical mutagenesis direct method for quantitalion of extreme polymerase error frequencies at selected single base sites in viral rna rapid evolution of rna viruses subacute sclerosing panencephalitis isolation and characterization of the measles virus f, polypeptide: comparison with other paramyxovirus fusion proteins influenza virus pathogenicity: the pivotal role of hemagglutinin differences between the intracellular polypeptides of measles and subacute sclerosing panencephalitis virus role of viruses in chronic neurological diseases we thank charles weissmann for helpful discussions, bert rima for communicating unpublished data, lsidro ballart for part of the m gene sequence of case mf, hugh pelham, pramod yadava, and deborah maguire for critical comments on the manuscript, and fritz ochsenbein for the photographs. this work was supported by grant 3.141-085 of the schweizerische nationalfonds, by the kanton ziirich, and by the deutsche forschungsgemeinschaft.the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "adwtisement" in accordance with 18 u.s.c. section 1734 solely to indicate this fact.received may 26, 1988. key: cord-305290-xnjwv0d7 authors: atkins, john f.; weiss, robert b.; gesteland, raymond f. title: ribosome gymnastics—degree of difficulty 9.5, style 10.0 date: 1990-08-10 journal: cell doi: 10.1016/0092-8674(90)90007-2 sha: doc_id: 305290 cord_uid: xnjwv0d7 nan during translational elongation, the paired codon and anticodon can sometimes disengage at certain sequences, allowing the mrna to slip with respect to the ribosomepeptidyl-trna complex. the anticodon may then re-pair with a now-nearby similar codon, so that synthesis continues downstream. on a run of 4 identical bases the reengagement may occur at a codon 1 base removed from the original in-frame codon, with a resultant frameshift. "slipping" of this type is part of the mechanism for many of the examples of -1 or +l frameshifting described below. if the shift occurs over a considerable distance without intermediate pairing, however, the ribosome hops down the mrna. hopping requires a "takeoff site" codon and a similar sequence acting as a "landing site" immediately 5' of the codon for the next amino acid on the resumption of synthesis. hopping was first encountered over short distances by inserting test sequences early in the lacz gene of escherichia coli (weiss et al., 1987) . for instance, cuu uag cua (leu stop leu) was decoded with an efficiency of 1% as a single leucine from the 9 nucleotides. hopping was also detected when the takeoff and landing sites overlapped, as in the sequence wua (weiss et al., 1987; o'connor et al., 1989) . at about the same time, trna mutants were isolated that increased the hopping at certain sites (falahee et al., 1988; hughes et al., 1989) , and hopping was detected over as many as three stop codons, albeit at decreased efficiency. the mutants have an extra base in a trna anticodon (o'connor et al., 1989 ) that somehow promotes hopping. one inference to be made from their study is that there may be good reasons why almost all natural trnas have seven-membered anticodon loops! even with these precedents, the discovery of high level, natural, programmed hopping was a surprise. the 50 nucleotides that separate codon 46 from codon 47 in the mature message of phage t4 topoisomerase subunit gene 60 are bypassed by the translation apparatus with an efficiency approaching 100% (huang et al., 1988) . several key features required for ribosomal bypass of this coding gap have been defined utilizing variants generated as gene 60-/acz fusions (weiss et al., 199oa ). the analogy to low level and trna suppressor-mediated hopping is supported by a strict requirement for a matched set of codons at the takeoff and landing sites. as is the case with all high level unusual ribosomal frameshift or readthrough sites, the interesting question is how the message conspires with the translation apparatus to increase the efficiency and scope of these events. in the gene 60 case, there are at least four distinct elements that contribute significantly to the bypass. three of these elements are located at the coding gap: the matched codon set defining its borders, a stop codon at the 5'junction of the gap contained within a short stemloop structure, and an optimal 50 nucleotide spacing separating the 5' and 3' junctions. the fourth, and most surprising, feature is a stringent requirement for a specific amino acid sequence in the nascent peptide translated from the 46 codons preceding the coding gap. this nascent peptide enables the ribosome that has just synthesized it to bypass the coding gap, although its mode of action is undefined. this nascent-chain effect adds another example to an expanding list of interesting translation events mediated through the nascent protein chain, such as signal recognition particle arrest of elongation (wolin and walter, 1988 ) autoregulated instability of 8-tubulin mrna (yen et al., 1989) and regulation of the carbamoylphosphate synthetase a gene, cpa7, in saccharomyces cerevisiae (werner et al., 1987) . another example of high level natural hopping could be in the cara gene of pseudomonas aeruginosa, which encodes the small subunit of carbamoyl-phosphate synthetase (wang and abdelal, 1990) . two sets of codons that could potentially act as the takeoff and landing sites occur at nucleotides 9 to 15 and 21 to 27 downstream of the start codon. in contrast to the gene 60 case, the untranslated 12 nucleotides do not contain a stop codon. since this putative example has just been found, the critical features are unknown, but cannot fail to be interesting. and stimulators for +1 and -1 frameshifting the great majority of ribosomal frameshifting events studied are due to a trna slipping from pairing with its correct in-frame codon to an overlapping -1 or +l cognate codon. a string of four or more single base repeats constitutes a "slippery run" prone to frameshifting. in some instances the run is minimal: for instance, in decoding e. coli polypeptide chain release factor 2 (rf2) (craigen et al., 1985) the run is cuu u (weiss et al., 1987) . a cuudecoding trnalbu slips +i to a uuu sequence using a g:u pair in the first position. the lack of perfect classical complementarity in re-pairing may mean there are relaxed rules for re-pairing (first position wobble). shiftiness in this instance does not uniquely depend on the trnaleu: when the cuu u string is replaced by guu u or ggg u, then trnavar or trnagry, respectively, performs high level shifting (weiss et al., 1987) . as first discovered in retroviruses, the possibility for two adjacent trnas to shift in tandem can for some pairs increase the level of shifting higher than the sum of shiftiness at either codon in isolation (jacks et al., 1988a) . the tandem slippery sequences a aaa aac, u uuu uua, and g gga aac are common in retroviral shift sites (the upstream a, u, or g being essential). the shift in reading frame, which is -1, occurs predominantly at the second codon of the slippery pair (hizi et al., 1987; jacks et al., 1988a) . coronaviruses use a combination of the above slippery sequences, namely u uua aac, for their frameshifting (brierley et al., 1987 (brierley et al., , 1989 bredenbeek et al., 1990) . the mechanism underlying -1 frameshifting at tandem slippery codons appears to be the most universally conserved of frameshifting signals, given that retroviral shift sites can catalyze efficient -1 shifting when translated in e. coli (weiss et al., 1989) . a single base change in the mouse mammary tumor virus (mmtv) gag-pro shift site, from the normal a aaa aac to a aaa aag, surprisingly leads from ~2% to 50% -1 frameshifting at this sequence (a 25-fold increase); and the lo-fold decrease between a aaa aag and a aaa aaa affords an interesting glimpse at how the nuances of codon-anticodon interaction can govern the efficiency of this type of shifting. the high level of shifting at a aaa aag is, in fact, utilized in the decoding of an e. coli gene, dnax, which encodes two dna polymerase iii subunits (flower and mchenry, 1990; blinkowa and walker, 1990; tsuchihashi and kornberg, 1990) . frameshifting is utilized by one bacterial is1 ("insertion sequence 1") element (sekine and ohtsubo, 1989) , and by extrapolation from the known sequences, is likely to be utilized by at least some members of the is3 family as well (p&e et al., 1990) . in several of these examples, the shift site is again likely to be a aaa aag. shifty runs alone, however, are not sufficient for high level shifting. secondary signals programmed in the mrna augment shifting at the slippery sequence to give high levels of frameshifting. we call these signals "stimulators;' and they are very diverse (figure 1 ). for the +l shift for decoding rf2, two stimulators are utilized. one is a shine-dalgarno sequence located 3 nucleotides 5'of the shift site that interacts with the 3'end of 16s ribosomal rna of elongating ribosomes (weiss et al., 1988a) . this finding leads to the surprising conclusion that ribosomal rna is continuously monitoring mrna sequences during translation. the second stimulator is a uga terminator at codon 26 flanking the shift site on its 3' side (weiss et al., 1987; curran and yarus, 1988) . the stimulators act independently with substantial activity, but their effects are synergistic. use of both stimulators means that rf2 frameshifting utilizes both an interaction normally involved in initiation and one involved in termination, within the middle of a decoding region! at least in e. coli, a role of 3'flanking termination codons in promoting frameshifting has been uncovered with constructs made in a /acz reporter system. this role of stop codons has been extensively investigated, and while it may act by causing a long pause in decoding, the alternative of an abortive termination event prior to release factor binding needs to be seriously considered (weiss et al., 1990b) . the 5' rf2 stimulator, the shine-dalgarno sequence, which augments a +l shift, has the effect of forcing the mrna in the direction it normally moves (3' to 5' [the ribosome normally moves 5' to 3' with respect to the message]), while the -1 shift at the retroviral tandem slippery sequence forces the mrna backward (5'to 3'). when these two components are put together in an artificial hybrid, the shine-dalgarno stimulator for rf2 dampens the retroviral shift. in other words, a stimulator for a +l shift can act as an inhibitor for a -1 shift (weiss et al., 1990b) . the stimulator for several of the retroviral shifts, and also the coronoviral and dnax shifts, is 3' mrna sequences probably in the form of stem-loop structures. a stem-loop structure can be drawn 7 nucleotides downstream of the actual, or putative, shift site in many retroviruses (jacks et al., 1988a; le et al., 1989) . in the absence of the stimulator, much less frameshifting occurs at the second codon of the pair, revealing a low level shift at the first codon. evidence for the involvement of the stimulator loop region in additional pairing to form a pseudoknot has been presented for the coronavirus avian infectious bronchitis virus (ibv) (brierley et al., 1989) . the involvement of a pseudoknot has been inferred for another coronavirus, mouse hepatitis virus (bredenbeek et al., 1990) for some, but not all, retroviral examples (brierley et al., 1989; ten dam et al., 1990) and in a variety of other instances (ten dam et al., 1990) . when the 3' sequence is deleted, there is a great reduction in coronoviral or mmtv frameshifting. how pseudoknots affect frameshifting is not clear, but is likely to be more sophisticated than an effect on decreasing or increasing the stability of the stem-loop structure. the question does, however, raise the general issue of how ribosomes open up mrna secondary structure. jacks et al. (1987, 1988a) proposed that the stem-loop structures appropriately positioned downstream of the retroviral shift sites cause pausing of ribosomes such that the shifting on the double slippery codons at the decoding site prior to peptide bond formation is facilitated. while the leading ribosome may be caused to pause by certain stem-loop structures, if the following ribosomes are closely spaced they may not encounter the stem-loop structure in the same way. wolin and walter (1988) have found that eukaryotic ribosomes can be tightly packed behind the leading ribosome at the stall site they examined in a preprolactin mrna in vitro. these results should be interpreted with caution, as all the higher eukaryotic frameshifting studies with altered sequences have been done with a reticulocyte lysate cell-free translation system; there are likely to be differences in the number of ribosomes loaded per message, and perhaps in the trna balance, from less specialized cells in vivo. experiments with tissue culture cells are clearly needed, especially since reticulocyte lysate protein synthesis experients, or studies in yeast cells, showed little or no effect of s'sequences on human immunodeficiency virus (hiv) frameshifting . whether this is due to a minor quirk of the in vitro system or is a basic difference of hiv from other retroviruses has not yet been determined. the hiv family of retroviruses is different in many important ways from the other retroviruses in which frameshifting has been studied, and it will be interesting to compare any hiv frameshifting stimulator, when found, with that of the other viruses. results from tissue culture cells will need to be compared with the findings from infected cells where altered trna modification has been reported. the trnaphe present in cells infected with a retrovirus that utilizes a u run as the slippery sequence lacks the highly modified wye base in its anticodon. similarily, the trnaasn decoding aac in the a aaa aac sequence lacks the q base in its anticodon (hatfield et al., 1989) . as the earlier results have shown, these undermodifications are clearly not essential for frameshifting, but to what extent they contribute needs to be assessed. the in vitro results must not be treated lightly, however. the amino acid sequence of the frameshift junction of in vivo synthesized gag-pro fusion polyprotein from mmtv has been determined, and shown to result from a shift at the same site as the in vitro product (hizi et al., 1987) . even in e. coli cells, frameshifting on retroviral sequences is augmented in a similar way by the same stimulators as in the reticulocyte lysate system (weiss et al., 1989) . this is surprising in view of the divergence of prokaryotic and eukaryotic ribosomes. there appears to be conservation at the ribosomal level of the essential components for this type of shifting. a model for retroviral shifting based on hybrid three-site (a, r and e) decoding (moazed and noller, 1989) has been presented (weiss et al., 1989) . in this variant of the original model, shifting occurs after transpeptidation and perhaps during translocation itself. for the +l shifting with the yeast ty ("transposon yeast") elements (clare and farabaugh, 1985; mellor et al., 1985) the shift site, cuu agg c, does not at first glance look like a slippery repetitive run used in retroviral shifting (belcourt and farabaugh, 1990) . however, for the trnaleu that decodes the 0 frame cuu, it is a slippery sequence. this trna has the anticodon 3'gau-5', with an unmodified uracil in its wobble position, so is able to decode the +l frame uua after slipping forward 1 base. the stimulator in this case is a 3' adjacent rare arginine codon, agg, decoded by a minor trna (see belcourt and farabaugh, 1990 ) that might cause a pause in translation. when the level of the minor trnaaw is increased, the level of frameshifting decreases (xu and boeke, 1990 ). however, no other rare codon will act as a stimulatorthere must be something special about the interaction of the trnaars with its codon (belcourt and farabaugh, 1990 ). thus the complete information for the high level shift (approaching 50%) in tyl is contained within a 7 nucleotide stretch (cuu agg c) encompassing the shift site, and this is conserved from tyl to ty2. interestingly, when this consensus sequence is placed within four codons of the translation start site, then shifting drops down almost 40-fold to background levels (belcourt and farabaugh, 1990) . the reason for this is unknown, but might be explained on the basis of the three-site model for ribosomes with occupancy of the exit (e) site by a noninitiator trna affecting a site binding (rheinberger and nierhaus, 1986) . by frameshifting e. coli rf2 uniquely causes release at uga. before the sequence became known, it was thought that a clever way to regulate the synthesis of rf2 would be to have an inframe uga stop codon early in the gene. if there was an adequate level of rf2, termination would ensue and further rf2 would not be synthesized. if rf2 was limiting, then this would allow ugg-decoding trna'p the chance to insert trp at the uga and the readthrough would permit decoding of the bulk of the gene to replenish the supply of rf2. as discovered by craigen et al. (1985) however, this model was only half correct. as introduced above, there is an in-frame uga terminator at codon 26, but a +l frameshift is required to decode the main part (downstream) of the gene. the frameshifting can be at high levels (30% or more), which is higher than stop codon readthrough levels (except where selenocysteine is inserted; see below), and this may be one reason for the utilization of frameshifting rather than stop codon readthrough. of polyproteins generated by frameshifting many of the examples of frameshifting and readthrough have been found in viruses. this may be due to the compactness of viral genomes, but is also likey to be a reflection of our relatively greater knowledge of the expression of viral genes. the first several examples required the ribosomes to shift frame near the end of a gene, and in the new frame to bypass the normal terminator to produce an elongated protein. the function, if any, of frameshifting in the initial example, the coat lysis hybrid of phage ms2, which is due to a +l shift (atkins et al., 1979; beremand and blumenthal, 1979 ) remains unknown. the coat lysis hybrid is not incorporated into the virion, but the possibility that an equivalent frameshift product from the pseudomonas rna phage pw is incorporated is under investigation (garde et al., unpublished data) . the functions of the elongated products from genes 5.5 and 10 of phage t7, which are due to -1 shifts (dunn and studier, 1983) remain unknown. however the gene 70 shift occurs at a moderately high level and utilizes a distant 3' stimulator (condron et al., unpublished data) . much more is known about the retroviral and related examples. the initial findings were with rous sarcomavirus, where 5% of the ribosomes translating gag shift to the -1 frame shortly before the terminator and enter pal (jacks and varmus, 1985) . with hiv the situation is similar except that the level is higher, 12% jacks et al., 1988b) . with mmtv, in contrast, two -1 shifts are required, as the protease gene lies between the gag andpol genes. a -1 shift at 23% efficiency brings ribosomes to the protease gene, and a second shift at 8% near the end of the protease gene brings ribosomes to the pol gene to generate a gag-pro-p01 fusion at an overall ratio to gag similar to that found with the single-shift situation with roussarcomavirus (moore et al., 1987; jacks et al., 1987) . sequences similar to those responsible for frameshifting in these three retroviruses (see above) are found in the great majority of other sequenced retroviruses, making it likely that frameshifting (all -1) is used by them to make the gag-pal fusion polyprotein. (the alternative of stop codon readthrough is seen in murine leukemia virus and a few other retroviruses; see below). the yeast "killer" particle uses a -1 frameshift mechanism similar to that of the retroviruses (icho and wickner, 1989; diamond et al., 1989) . a -1 frameshift is used in the replicase gene of two coronaviruses, avian ibv and mouse hepatitis virus, to allow ribosomes to enter a region encoding a further, massive 350 kd of protein. it is likely also to be used by a number of (+)-stranded plant viruses (miller et al., 1988; veidt et al., 1988; xiong and lommel, 1989) and in at least one of these it is now being confirmed. there seems to be no significance in the frameshifting in the above examples being -1. many retrotransposons, retrovirus-like dna transposable elements that replicate via an rna intermediate during transposition, use +i frameshifting. one example is the yeast ty elements, which utilize +l frameshifting to generate a coat protein-polymerase fusion at an efficiency of approximately 20%. virus-like particles have been found for tyl and at least one drosophila retrotransposon. in the drosophila element 17.6, frameshifting is inferred to occur by analogy with its ty equivalents, especially ty3 (hansen et al., 1988) . with the bacterial is elements the product of the upstream gene insa binds to the terminal inverted repeat sequences of the is element. a -1 frameshift near the end of the insa gene yields an insa-insb fusion product that competes with the insa product for binding to the terminal sequences and, in addition, has transposase activity. the level of frameshifting appears to determine the level of transposition (see sekine and ohtsubo, 1989 ), but whether the frameshifting level is responsive to stress or other conditions is unknown. double-stranded dna phages have also responded to the lure of shiftiness: in phage k a relatively low level of -1 frameshifting (~2.5%) occurs at a double slippery codon near the end of gene g to generate a fusion product termed gt gt is an essential phage tail component, and translation of the t open reading frame except as part of the fusion has not been detected (m. levin, r. hendrix, and s. casjens, unpublished data). frameshifting is the favored, though not the established, explanation for basal-level synthesis of outer membrane genes coding for major antigenic components of haemophilus gonorrhoeae, with consequences for influencing the configuration of repeat elements present in these genes (belland et al., 1989) . with the insertion of a "standard" amino acid for some time, the genomic rna from the rna phages was the only purified natural mrna available for translation studies, and it is not surprising that the first example of readthrough of a stop codon was found in one of these phages, q6. readthrough of a leaky uga terminator at the end of the cl6 coat protein gene results in tryptophan being inserted in response to the uga codon at the relatively low efficiency of ~3% (weiner and weber, 1973) . readthrough results in a considerably elongated product that is incorporated into the virion and is essential for infectivity (hofstetter et al., 1974) . in contrast, uag readthrough is required for synthesis of a subunit of the pili of toxigenic e. coli (jalajakumari et al., 1989) . a well-known example of uag readthrough is found in the replicase gene of tobacco mosaic virus (pelham, 1978) . termination at the leaky uag stop codon results in the synthesis of a 126 kd protein, whereas readthrough (at an efficiency of 10%) leads to the synthesis of a 183 kd product essential for infectivity. the uag can be replaced by uaa without affecting infectivity (ishikawa et al., 1986) . in tobacco mosaic virus and two unrelated viruses, beet necrotic yellow vein virus and turnip yellow mosaic virus, the sequences immediately surrounding the leaky uag stop codon are conserved, unlike the rest of the nearby sequences. the conserved sequences are caa uag caa @a. we wondered if the stop codon was hopped over by trnagrn, rather than read through. however, skuzeski et al. (unpublished data) have evidence that rather than hopping being involved, there is a special context or stimulator for readthrough. a second group of plant rna viruses has different sequences flanking the leaky uag stop codon. this group, which includes maize chlorotic mottle virus (nutter et al., 1989) has the sequence aaa uag g. readthrough is also used by several (+)-stranded animal viruses of the alphavirus family, such as sindbis (strauss et al., 1983) ; interestingly, other members of the same family, such as semliki forest virus (takkinen, 1986) , have a sense codon in place of the stop codon. in sindbis virus, stop codon readthrough (in the alphaviruses the stop codon utilized is uga) is involved in expression of one of the replicase constituent proteins. replacement of the uga stop codon by either a sense codon or one of the other stop codons has subtle deleterious effects on the virus (li and rice, 1989). it is suspected, but not established, that the polyprotein plays some role independent of its processed components. the first retrovirus found to have a "special" translation mechanism for the generation of its gag-pal fusion was murine leukemia virus, where there is in-frame readthrough of a leaky uag terminator at the end of gag (philipson et al., 1978; yoshinakaet al., 1985a) . a minority of other retroviruses, such as feline leukemia virus (yoshinaka et al., 1985b) , are very likely to use a similar mechanism. the 10% readthrough efficiency of the uag terminator of the gag gene of murine leukemia virus in vivo is dependent on preservation of a strong stem-loop structure containing the uag in the loop (jones et al., 1989; see also panganiban, 1988) . it is presumed that the stem-loop structure is required for the readthrough process, and so it is a stimulator in the sense used above, but the possibility has not been excluded that this structure is required solely for some other viral function. ten dam et al. (1990) have pointed out that the stem-loop containing the stop codon in the loop has not been preserved in feline leukemia virus; instead, one of the most stable potential stem-loop structures occurs just downstream of the uag codon and is capable of forming a pseudoknot. this putative stem-loop begins 8 nucleotides 3' of the stop codon, a similar distance from the shift site in those viruses utilizing frameshifting, and a similar structure may also be able to form in murine leukemia virus. an investigation of the requirements for uag readthrough in the feline leukemia virus context is clearly needed. as in the case of sindbis, the leaky uag of murine leukemia virus can be changed to one of the other terminators and readthrough is still observed (feng et al., 1989a; jones et al., 1989) . it has been reported that infection with murine leukemia virus induces the synthesis of a minor trnaor", and the induction is inhibited by the antiviral compound avarol (kuchino et al., 1987 (kuchino et al., , 1988 . the stimulation has not been seen by others (feng et al., 1989b) . whatever the resolution of this issue, the minor trnagrn is probably not required for readthrough at the end of the gag gene. with insertion of selenocysteine: mrna stimulator, special trna, and elongation factor recently it has been found that selenium is cotranslationally incorporated in protein in the form of selenocysteine in response to "special" uga codons. the best studied occurrences of selenocysteine are in formate dehydrogenase of e. coli (zinoni et al., 1986) and in glutathione peroxidase (chambers et al., 1986) of several mammalian species. selenocysteine is highly sensitive to oxidation, but its function is mostly unknown. preliminary studies have shown that formate dehydrogenase containing cysteine in place of selenocysteine has 4-to 5-fold lower specific activity (bock and stadtman, 1988). however, apart from the evolutionary questions raised by the cotranslational insertion of selenium, the main interest raised by this remarkable discovery is in how it occurs, and what is "special" about the uga codons that encode it. a minor uga-decoding seryl-trna in a variety of vertebrates has been known of for many years (diamond et al., 1981) . this trna can be converted to phosphoseryl-trna, which in turn (lee et al., 1989) can be converted to selenocysteyl-trnaser. however, in e. coli the conversion giving selenocysteine on trna does not occur by this mechanism (see leinfelder et al., 1990) . regardless of the details, in both eukaryotes and prokaryotes selenocysteine is the 21st amino acid found to be directly encoded. it will be interesting to see if incorporation of other natural nonstandard amino acids can be engineered by mutational alterations of aminoacyl-trna synthetases. a unique trna for selenocysteine incorporation has also been characterized in e. coli. it has the anticodon 3'-ac&y, complementary to uga in the mrna, and an unusually long acceptor stem (schdn et al., 1989 , and references therein). when the "special" uga triplet decoded by this trna is replaced by a ugc cysteine codon (but not by a uca codon with a second position change), the level of selenocysteine incorporation approaches that found when uga is present (see zinoni et al., 1990) . the selenocysteine-inserting trna can therefore compete effectively with trnacfl, as well as with rf2, when the mrna context is "special." the mrna signals that specify incorporation of selenocysteine at the "special" uga codon in formate dehydrogenase are partially known (zinoni et al., 1990) . the 27 nucleotides 3' of the uga are crucial, and the efficiency of the process is influenced by the 12 nucleotides downstream (i.e., up to 39 nucleotides 3' of the uga) and 9 bases upstream of the uga. a stem-loop structure with 39 bases can be drawn immediately downstream of the uga in formate dehydrogenase mrna, and a similarily positioned stem-loop can be drawn immediately 3' of the "special" uga in human glutathione peroxidase mrna. whether 5' nucleotides in formate dehydrogenase mrna are involved in an alternate stem-loop also awaits testing. decoding of the "special" uga in e. coli formate dehydrogenase mrna also requires a unique elongation factor tu-like protein, selb, the product of the se/s gene (see forchhammer et al., 1990) . selb is considerably larger than eftu (68 vs. 43 kd), but it may have additional functions such as recognition of the aminoacyl residue, specific recognition of the mrna context around the selenocysteine-specific uga codon, and/or competition for the binding of rf2 (see forchhammer et al., 1990) . the protein interacts with guanosine nucleotides and selenocysteyl-trna. it is intriguing to think that there may be families of elongation factors for specific purposes, perhaps analogous to multiple rna polymerase sigma factors. the eukaryotic analog of eftu is ef-1. recently, studies on yeast mutants that promoted frameshifting revealed a new ef-l-like protein that is considerably larger than ef-1 (wilson and culbertson, 1988) . this protein plays a role in coordinating translation with global cellular events such as progression through the cell cycle (kikuchi et al., 1988; see culbertson et al., 1990) . surprisingly, there may even be families of different types of ribosomes in special circumstances, such as specific developmental stages in plasmodium (waters et al., 1989) . and readthrough versus splicing the simple expedient of having promoters or translational start sites of different strengths can ensure a set ratio of two products. however, on occasion it is advantageous to have large amounts of one product (e.g., for structural purposes) and proportionally small amounts of another (e.g., for catalytic purposes), but to have the latter made as part of a fusion protein with the structural component at the 5' end of the polyprotein. most of the time, the ribosomes terminate at the end of the structural gene. however, by having a small proportion of messages with the stop codon spliced out (alternative splicing) or by its equivalent, cir-cumventing the stop codon at the translational level, a small amount of the fusion polypeptide could be made. frameshifting and readthrough avoid one potentially deleterious consequence of splicing and also offer some potentially beneficial possibilities not provided by splicing. there is one consideration peculiar to (+)-stranded rna viruses. since they use genomic (+)-stranded rna as a template both for protein synthesis and for replication, they need to avoid packaging spliced rna that would lead to the accumulation of defective viruses. many of them avoid rna splicing completely, but others restrict splicing to the generation of spliced rnas that do not contain the packaging site. splicing is used, for example, to generate a subgenomic rna that contains the leader region fused to the env gene, but lacking the packaging site in the gag gene, which encodes the structural core ("coat") proteins. interest in nonstandard decoding increased sharply with the finding that frameshifting or readthrough, but not splicing, is utilized to produce the gag-pal fusion polyproteins of retroviruses. in the retroviruses, there is no ribosome initiation at the beginning of the pal gene, and the gag-pal fusion is the only source of the catalytic pol products, reverse transcriptase and endonuclease. one reason for generating the gag-pal fusion may be to ensure the packaging of the polymerase by virtue of its attachment to the core gag proteins. a second may be to ensure also that the reverse transcriptase component of the pal product is inactive, by virtue of being part of a fusion polyprotein (witte and baltimore, 1978; felsenstein and goff, 1988) , until the viral rna is sequestered by the core proteins. the ratio of gag-pal product to gag is rather critical, giving rise to hopes that compounds affecting the process of readthrough or frameshifting may be more detrimental to viral decoding than the translation of any putative cellular genes that utilize either process. interestingly, a minority of retrotransposons and viruses such as hepatitis b virus (chang et al., 1989) and cauliflower mosaic virus (schultze et al., 1990) which are not retroviruses or retrotransposons but which use reverse transcriptase, do not utilize frameshifting or readthrough to generate theirpol product. a comparison of their life cycles with those of the retroviruses and the majority of retrotransposons is helpful in discerning the reasons for readthrough or frameshifting (chang et al., 1989; schultze et al., 1990) . a minor alternative to splicing, readthrough, and frameshifting is posttranscriptional editing of some molecules of a particular mrna to generate an in-frame termination codon in the coding region. an example where only a single base is altered is apolipoprotein 6, where the modification is tissue specific and subject to hormonal modulation. roles when frameshifting or readthrough brings ribosomes to a region downstream of the gene terminator, it may not be the protein product per se that is important, but rather the consequences of ribosome movement. translation of bacterial biosynthetic operon leader peptide genes provides a precedent for the role of ribosome movement itself being important. there is no role for the peptide product-only the act of its synthesis is important. an equivalent role for frameshifting has not been established, but has been proposed. the -1 frameshift product of the replicase gene of the rna phage ms2 detected in vitro (atkins et al., 1979) has not yet been seen in vivo, but if made, the shifted ribosomes may have a crucial role in influencing the progress of phage replicase (dayhuff et al., 1986) . similarily, it may be that a function of ribosomes that shift during decoding of the mrna from the d gene of the single-stranded dna phage (px174 is to regulate lysis expression by unmasking mrna structure and permitting reinitiation (buckley and hayashi, 1987) . (however, even though frameshifting was proposed to be involved in the normal mode of expression of phage ms2 lysis [kastelein et al., 19821 , this was later shown to be incorrect for elaborate reasons [berkhout et al., 19871.) though complicated and only partially understood, the nonproduct role of translation of the leader peptide gene of the tryptophanase operon, which encodes an enzyme for the degradation of tryptophan, is likely to be intriguing (gollnick and yanofsky, 1990) . it may well be that ribosomes downstream of the terminator in some instances will have a role in influencing mrna half-life by disrupting secondary structure and thus the rate of degradation by rnaases. in certain cases, such a device could be used to tie the timing of degradation to the number of times the message has been translated. when shifted ribosomes encounter a stop codon in the new frame before bypassing the terminator in the original frame, they will synthesize a truncated product that will have some amino acids at its carboxy-terminal end not present in the 0 frame product: at an early stage this was suggested as the explanation for some plant viral products, but has not been substantiated. however, this is the explanation for the decoding of a 52 kd subunit of e. coli dna polymerase iii from the gene dnax, where the 0 frame encodes the 71 kd polymerase subunit. the -1 frameshift event occurs two-thirds of the way through the gene transcript at a 50% level, and tlie shifted ribosomes terminate early in the new frame to yield the 52 kd protein. both products are present at high levels in the polymerase complex. both subunits share a binding site for atp (or datp), but only the larger subunit has a dna-dependent atpase activity, presumably due to a dna binding site present in its carboxy-terminal domain (see tsuchihashi and kornberg, 1990) . why the two different subunits are utilized and why frameshifting is involved remain intriguing questions. it has been proposed that the longer product is associated with the highly processive leading-strand half of the polymerase, while the shorter, frameshiftingderived product may be associated with the laggingstrand half (see flower and mchenry, 1990 ). codons in vitro protein synthesis experiments with an unperturbed mix of normal trnas have shown that e. coli trnaf (anticodon 3'-[u)cg-5') and trnalhr (anticodon 3'-(u]gg-5') at an efficiency of a few percent, read the first 2 bases of gca alanine and ccg proline codons, respectively, to cause -1 frameshifting (atkins et al., 1979; dayhuff et al., 1986) . this type of frameshifting is not a general property of trnas but, at least at a high level, may be unique to trnap and trnajhr. anticodon replacement experiments have shown that the shifting ability is a special property of the anticodon and not a peculiarity of the,rest of the trna (bruce et al., 1986) . increasing the ratio of trnap or trnaghr in proportion to the levels of the trnas that normally decode the gca alanine and ccg proline codons increases the level of frameshifting. this noncognate type of mechanism has been proposed to explain the frameshifting seen with (px174. the alternative way to perturb the balance of aminoacylated trnas is to cause amino acid starvation (weiss and gallant, 1983) . in some instances this also causes frameshifting, but in the cases analyzed it is due to cognate reading of an overlapping triplet codon rather than to noncognate reading of an in-frame doublet codon (weiss et al., 1988b) . there is currently no evidence that frameshifting promoted by amino acid starvation is utilized in vivo, but it is not very different from the ty case where a combination of rare codon and minor trna is the stimulator for frameshifting at a slippery codon. before the mechanism of ty frameshifting was discovered, frameshifting promoted by tandem rare agg arginine codons had been investigated in e. coli. agg agg placed in a gene expressed from an efficient promoter off a high copy number plasmid showed +l frameshifting at this sequence at an efficiency of up to 50% (spanjaard and van duin, 1988) . the shift occurs at the rare codons, but is dependent on extreme expression levels that may result in sequestration of the trnaars at the first of the two codons so that the trna may be limiting even for the first codon and more so for the second codon. normal frame maintenance an estimate of the "background" level of frameshifting came from studies of the leakiness of frameshift mutants (atkins et al., 1972; kurland, 1979) . however, some frameshift mutants are leaky at much higher levels than others, owing to the chance occurrence of nearby sequences prone to shiftiness (fox and weiss-brummer, 1980; atkins et al., 1983) . a more revealing study has examined the level of frameshifting in a long natural sequence that is free of stop codons in one or another of the alternative frames. the value obtained for an aggregate of over 90 codons in each alternative frame was, at most, a few percent. interestingly, the level of shifting was not higher when frameshifting from the alternative frame to the wildtype 0 frame was monitored (weiss et al., 1990b) . this result argues against a recently proposed (trifonov, 1987) intrinsic framing mechanism within the coding sequence. one reason for isolating mutants that promote frameshifting (frameshift mutant suppressors) was to try to define the components responsible for frame maintenance. the first mutant sequenced had a trna antioodon enlarged by 1 base and caused a 4 base translocation (see riddle and carbon, 1973) . however, several frameshift suppressors first isolated (riyasaty and atkins, 1968) have the normal 7 base anticodon loops (with their changes elsewhere in the trna) and yet cause both doublet and triplet reading (c'mahony et al., 1989) . the trna mutants do not permit a simple answer to the question of frame maintenance (tuohy et al., 1990; see culbertson et al., 1990) . perhaps the most telling experiment is that reported by spirin (1987) , where under certain conditions trna was translocated in mrna-free ribosomes, implicating, as expected, trna as a principal component of normal mrna movement. however, as has been pinpointed by the suppressor studies, many other components of the translation apparatus, such as ef-tu and rrna, influence frame maintenance. codon-anticodon interaction involving only 3 bp is insufficiently strong in the absence of ribosomes to permit decoding. this poses a problem for considering the evolutionary origin of decoding, which necessarily took place before the advent of ribosomes with their numerous protein constituents. one possibility is that the original codonanticodon interaction involved 5 bp, which would confer the necessary stability. woese (1970) and crick et al. (1976) proposed a model for how this may happen without disastrous consequences in subsequently changing to triplet codon-anticodon pairing on ribosomes. in their reciprocating ratchet model, the trna paired with 5 codon bases but only a 3 base codon was decoded. a modified form of this model (weiss, 1984) seemed an attractive explanation for several of the anomalous results obtained with the hungry codon and noncognate frameshifting studies described in the last section, but further work has shown that this model is unlikely (bruce et al., 1986; weiss et al., 1988b) . since then, a different type of explanation for the dilemma of the origin of decoding has been advanced. it has been proposed that ribosomal rna, without protein involvement, can stabilize codon-anticodon interaction by coaxial stacking of ribosomal rna (noller et al., 1986) . such stacking could, by strengthening codon-anticodon interaction, also help in maintaining the reading frame. high levels of frameshifting programmed by signals in the mrna are a far cry from the once widely held view (see whitfield et al., 1966) that decoding is invariably sequentially triplet. even now, the increasingly common recurrence of the retroviral type -1 frameshifting might lead one to think that there is a narrowly limited number of mechanisms and circumstances where efficient ribosomal frameshifting occurs. however, the variety of new examples of single-base frameshifting, as well as hopping and high level readthrough, makes it likely that nature has many tricks in store. the stimulatory effect of rna-rna interactions, often as stem-loop structures, in enhancing an unusual translation event by elongating ribosomes is an interesting and new feature. a tempting thought is that the diversity revealed is a reflection of the varied mechanisms used to cause pausing, but there is clearly much more involved. the novelty and intricacy of the unfolding insights into these phenomena reinforce the view that translational elongation and termination are proving to be no exceptions to the rich versatility being revealed in nearly every aspect of gene expression. atkins, j. f., nichols, b. p., and thompson, s. (1983) . the nucleotide sequence of the first externally suppressible -1 frameshift mutant, and of some nearby leaky frameshift mutants. embo j. 2, 1345-1350. belcourt, m. f., and farabaugh, i? j. (1990) . ribosomal frameshifting in the yeast retrotransposon ty: trnas induce slippage on a 7 nucleotide minimal site. cell 62, 000-000. chambers, i., frampton, j., goldfarb, p., affara, n., mcbain, w., and harrison, p r. (1986) . the structure of the mouse glutathione peroxidase gene: the selenocysteine in the active site is encoded by the "termination" codon, tga. embo j. 5, 1221 -1227 . chang, l.-j., pryciak, t?, ganem, d.. and varmus, h. e. (1989 . biosynthesis of the reverse transcriptase of hepatitis 6 viruses involves de nova translational initiation not ribosomal frameshifting. nature 332 364-368. clare, j.. and farabaugh, p (1985) . nucleotide sequence of a yeast ty element: evidence for an unusual mechanism of gene expression. proc. nab. acad. sci. usa 82, 2629-2633. craigen. w. j., cook, r. g., tate. w. p, and caskey, c. t. (1965) . bacterial peptide chain release factors: conserved primary structure and low activity of p-galactosidase in frameshift mutants of escherichis co/i and organization of barley yellow dwarf virus genomic rna intermediate states in the movement of transfer rna in the ribosome studies on the structure and function of ribosomal rna. in structure, function and the complete nucleotide sequence of the maize chlorotic mottle virus genome trna hopping: enhancement by an expanded anticodon glycine trna mutants with normal anticodon loop size cause -1 frameshifting retroviral gag gene amber codon suppression is caused by an intrinsic &-acting component of the viral mrna leaky uag termination codon in tobacco mosaic virus rna translation of mulv and msv rnas in nuclease-treated reticulocyte extracts: enhancement of the gag-pal polypeptide with yeast suppressor trna transposition in shigella: isolation and analysis of 15911, a new member of the is3 group of insertion sequences allosteric interactions between the ribosomal transfer rna-binding sites a and e nucleotide sequence of beet western yellows virus rna developmental regulation of stage-specific ribosome populations in plasmodium a single uga codon functions as a natural termination signal in the coliphage qp coat protein cistron molecular model of ribosome frameshifting mechanism of ribosomal frameshifting during translation of the genetic code slippery runs, shifty stops, backward steps, and forward hops: -2, -1, +l, +2, +5, and +6 ribosomal frameshifting. cold spring harbor symp reading frame switch caused by base-pair formation between the 3'end of 16s rrna and the mrna during elongation of protein synthesis in escherichia co/i on the mechanism of ribosomal frameshifting at hungry codons e. co/i ribosomes re-phase on retroviral frameshift signals at rates ranging from 2 to 50 percent a nascent peptide is required for ribosomal bypass of the coding gap in bacteriophage t4 gene 60 ribosomal frameshifting from -2 to +50 nucleotides the leader peptide of yeast gene cpa7 is essential for the translational repression of its expression classification of aminotransferase (c gene) mutants in the histidine operon suf72 suppressor protein of yeast. a fusion protein related to the ef-1 family of elongation factors hiv expression strategies: ribosomal frameshifting is directed by a short sequence in both mammalian and yeast systems we thank diane dunn, norma wills, and, at trinity college dublin, shahla thompson, for their major contributions. possibleframeshift regulation of release factor 2. proc. natl. acad. sci. usa 82, 3616-3620. crick, f. h. c., brenner, s., klug, a., and pieczenik, g. (1976) . aspeculation on the origin of protein synthesis.origins of life 7, 369-397.culbertson, m. r., leeds, p, sandbaken, m. g., and wilson, p. g. (1990) . frameshift suppression.in ribosomes: structure and function, w. hill, p. moore, r. garrett, j. warner, a. dahlberg, and d. schlessinger, eds. (washington, dc. : american society for microbiology), in press. curran, j. f., and yarus, m. (1966) . use of trna suppressors to probe regulation of escherichia co/i release factor 2. j. mol. biol. 203, 75-83. dahlberg, a. e. (1989) . the functional role of ribosomal rna in protein synthesis.cell 57, 525-529.dayhuff, t. j., atkins, j. f., and gesteland, r. f. (1986) . characterization of ribosomal frameshift events by protein sequence analysis. j. biol. chem. 267, 7491-7500. diamond, a., dudock, b., and hatfield, d. (1981) . structure and properties of a bovine liver uga suppressor serine trna with a tryptophan anticodon.cell 25 z., and kornberg, a. (1990) . translational frameshifting generates they subunit of dna polymerase ill holoenzyme. proc. natl. acad. sci. usa 87, 2516-2520.tuohy, t. m. f., thompson, s., hughes, d., gesteland, r. f., and atkins, j. f. (1990) . the role of eftu and other translation components in determining translocation step size. biochim. biophys. acta, in press. witte, 0. n., and baltimore, d. (1976) in press, 1990). for a discussion of frameshifting versus splicing, see also wickner (faseb j. 3, 2257 (faseb j. 3, -2285 (faseb j. 3, , 1989 ).this work was supported by the howard hughes medical institute and national institutes of health grant 12295c-02. key: cord-008613-tysyq6o4 authors: thomas, sheila m.; lamb, robert a.; paterson, reay g. title: two mrnas that differ by two nontemplated nucleotides encode the amino coterminal proteins p and v of the paramyxovirus sv5 date: 1988-09-09 journal: cell doi: 10.1016/s0092-8674(88)91285-8 sha: doc_id: 8613 cord_uid: tysyq6o4 the “p≓ gene of the paramyxovirus sv5 encodes two known proteins, p (m(r) ≈ 44,000) and v (m(r) ≈ 24,000). the complete nucleotide sequence of the “p≓ gene has been obtained and is found to contain two open reading frames, neither of which is large enough to encode the p protein. we have shown that the p and v proteins are translated from two mrnas that differ by the presence of two nontemplated g residues in the p mrna. these two additional nucleotides convert the two open reading frames to one of 392 amino acids. the p and v proteins are amino coterminal and have 164 amino acids in common. the unique c terminus of v consists of a cysteine-rich region that resembles a cysteine-rich metal binding domain. an open reading frame that contains this cysteine-rich region exists in all other paramyxovirus “p≓ gene sequences examined, which suggests that it may have important biological significance. in recent years the catalog of mechanisms identified as having a role in the processing or modification of the initial rna transcripts to yield mature mrnas has increased markedly. in eukaryotic cells the most common process found is splicing of the primary rna transcript (padgett et al., 1986) . less common variations on this splicing theme are alternative splicing, such that exons are excluded from some, but not all, of the mature mrna, giving rise to sequence diversity in the encoded proteins (for review, see breitbart et al., 1987) , and trans splicing, in which two independently transcribed rnas are ligated to form the mature mrna (konarska et al., 1985; solnick, 1985; murphy et al., 1986; sutton and boothroyd, 1986; krause and hirsh, 1987; koller et al., 1987) . mechanisms of rna transcript modification other than splicing include the phenomenon of rna-editing in mitochondrial transcripts from trypanosomes, which is characterized by the presence in the mature mrna of uridine residues that are not encoded in the gene (benne et al., 1986; feagin et al., 1987 feagin et al., , 1988 shaw et al., 1988) . in addition, a process related to rna-editing is thought to occur in primary transcripts of the mammalian apolipoprotein-b gene, as two discrete mrnas have been found, one of which has a u residue in place of a templated c (powell et al., 1987; chen et al., 1987) . in animal virus infected cells, many examples of spliced and alternatively spliced viral mrnas have been identified. in addition, an unusual cotranslational modification has been identified in vaccinia virus late transcripts that possess a poly(a) leader at the 5' end not encoded by the virus genome (bertholet et al., 1987; schwer et al., 1987) . the 5' poly(a) region is thought to be added by the vaccinia polymerase "stuttering" at a series of t residues on the template dna (schwer and stunnenberg, 1988) . in many virus systems, in addition to modification of the primary rna transcripts, the maximum protein coding potential on an mrna is exploited by the use of alternative translation strategies that also have a potential role in the regulation of viral gene expression. such mechanisms include translation from overlapping reading frames as observed for the coat, lysis, and synthetase proteins of bacteriophage ms2 (atkins et al., 1979) and the adenovirus e1b proteins (bos et al., 1981) ; ribosomal frameshifting that occurs to yield the gag-pol fusion proteins of rous sarcoma virus, human immunodeficiency virus, or mouse mammary tumour virus (jacks and varmus, 1985; jacks et al., 1987; varmus, 1988) , and that also occurs in the polymerase encoding region of infectious bronchitis virus (boursnell et al., 1987; brierley et al., 1987) ; and the use of suppressor trnas to overcome translation termination during translation of the gag-pol fusion protein in moloney murine leukemia virus (yoshinaka et al., 1985) and the nsp4 protein of sindbis virus (strauss et al., 1983 )~ in negative strand rna viruses, several of the processes discussed above are involved in the regulation of viral gene expression. for instance, in influenza a and b viruses, both spliced and unspliced mrnas that are translated to yield polypeptides from overlapping reading frames have been identified (lamb and lai, 1980; briedis and lamb, 1982) , and influenza a viruses also provide an example of alternatively spliced mrnas (lamb et al., 1981) . translation from overlapping reading frames on functionally bicistronic mrnas has been shown to be the mechanism used to yield the na and nb glycoproteins of influenza b virus (shaw et al., 1983) and the p and c proteins of the paramyxoviruses sendal virus and parainfluenza virus 3 and the morbillivirus, measles virus (giorgi et al., 1983; gupta and kingsbury, 1985; curran et al., 1986; luk et al., 1986; galinski et al., 1986; spriggs and collins, 1986; bellini et al., 1985) . simian virus 5 (sv5), a prototype paramyxovirus, has a single-stranded, negative sense genomic rna (vrna) approximately 15,000 nucleotides in chain length that is transcribed in infected cells by the virion-associated rna transcriptase to yield virus-specific mrnas. the sv5 "p" gene has been shown to encode both the p protein (mr = 44,000), and protein v (mr = 24,000) by the arrest of translation in vitro of both p and v using a cdna clone derived from sv5-specific mrnas (paterson et al., 1984) . in addition, the p and v proteins have been shown to have tryptic peptides in common, although no precursor-product kinetics could be demonstrated (paterson et al., 1984; our unpublished data t66 cc6 6ac 6gg tta 6ca aca agc 6ac tgc c66 tgc caa cab c6c aat cca caa tcc aca at6 6at ccc act 6at ct8 abc ttc tcc cca bat bag atc aat aa6 0 n .120 .140 .160 .180 .200 ctc ata gas aca g6c cts aat act gta 6a6 tat ttt act tcc caa caa 8tc aca 66a aca tcc tct ctt 66a aa6 aat aca ata cca cc~ 666 6tc aca 66a cta 0 leu-he~6~u~thr,220 . .440 .460 .480 .500 .520 aca tta cca tca 66a tcc tat are 666 6tt aa6 ctt gc8 aaa ttt 6ga aaa 6aa aat ct6 at6 aca cg6 ttc atc 6a6 6aa ccc aea gag ~t cct atc 6ca acc 0 thr his-gly-ser-$er-arg-asd-pro-61u-arg-]le-leu-ser-61m-pro-.540 .560 .580 .600 ,620 fist tcc ccc atc 6at ttt aa6 a66 66c ag6 gat acc 66c 666 ttc cat aga a66 8a6 tac tca atc 66a t66 6t8 66a gat gaa 6tc aa6 6tc act sag t66 tee 0 ser-~er-pr~-~ze-a~p-phe-lys-arq-6~y-arg-as~-thr-6~y~6~y-phe-~is÷~ -va~pr~-~-~er-~e-leu-~r~-6~y-a~a-6~y-he-pr~-~|a-6|y-$er-i~e-6~u-6~y-ser-thr-6]~6er-as~-6~y.660 .680 .700 .720 aat cca tcc tgt tct cca atc acc 6ct 6ca gca ag6 cga ttt gaa tgc act tgt cac cag tgt cca 6tc act tgc tct 6aa t6t 6aa cea eat act t .1280 .|298 act 6tt at6 aca ctg tac taa ccc t6a 666 ttt ta6 a figure 1 . nucleotide sequence of clone p203-1 in the mrna sense and the predicted amino acid sequence of the two open reading frames nucleotide 1 is the 5'terminal nucleotide of the p and v mrnas. after nucleotide 1298 there is a stretch of a residues in clone p203-1 (not shown) that is thought to represent part of the poly(a) tail on the mrna. the amino acid numbering of the +1 reading frame is adjusted to conform with the residues predicted to exist in the p protein (see figure 7b ). the sequence has been deposited in the embl/genbank data base (accession no. j03142). and is part of the transcriptase complex (buetti and choppin, 1977) , while protein v is found in infected cells and is of unknown function (peluso et al., 1977) . the mechanism by which both the sv5 p and v proteins are encoded by a single gene has been investigated and we report here that p and v are amino coterminal proteins with different c-termini and are encoded by two separate mrnas that differ by two nontemplated nucleotides. to investigate the coding strategy used to express both the p and v proteins from a single gene on the sv5 virion rna, the complete nucleotide sequence of three independently derived cdna clones (p203-1, p10, and p127) was obtained using both dideoxynucleotide chain-terminating and chemical sequencing methods. the nucleotide sequence of clone p203-1 is presented in figure 1 in the mrna sense. it is 1298 nucleotides in length and contains an untranslated region of 60 nucleotides preceding the first aug codon at nucleotides 61-63. primer extension nucleotide sequencing on mrna isolated from sv5infected cells indicated that nucleotide 1 is the 5' terminal nucleotide of the mrna (data not shown). at the 3' end after nucleotide 1298 in the different clones, there is a stretch of a residues of variable length. the open reading frame following the first aug codon (reading frame 0) is capable of encoding a protein of 222 amino acids. in the +1 reading frame there is an overlapping open reading frame of 250 amino acids, as illustrated schematically in figure 2 . although either reading frame could encode protein v (mr "~ 24,000), neither of the open reading frames is apparently large enough to encode p, a protein of m r ~-44,000 (paterson et al., 1984) , assuming the electrophoretic mobility of p is not aberrant. examination of the predicted amino acid sequences indicates a region from residues 190-218 (29 amino acids) in the 0 reading frame that contains 7 cysteine residues, whereas the +1 reading frame only contains 1 cysteine at residue 357. conversely, the +1 reading frame contains 6 methionine residues, whereas the 0 reading frame contains only one methionine, in addition to the initiation methionine. to facilitate the elucidation of the coding strategy for p and v, we used monoclonal antibodies specific for the sv5 p and v proteins, which were generously made available by dr. rick randall (randall et al., 1987) . the p and v monoclonal antibodies have been assigned to three groups, members of which recognize three nonoverlapping antigenic sites (r. randall, personal communication), here designated groups i, ii, and iii. the proteins immunoprecipitated from sv5 infected cell lysates by a representative member of each group are shown in figure 3 (left section). it can be seen that while p is immunoprecipitated by antibodies from all three groups, protein v is only recognized by the monoclonal antibody from group i. recognition of both p and v by group i monoclonal antibodies indicates that they have amino acid sequences in common. the identity of the protein indicated by a star, migrating in a pocysteine. right section: immunoprecipitation of sv5-infected cell lysates using the group i monoclonal antibody. it can be seen in this immunoprecipitation that some np coprecipitates with p and v using the group i antibody, probably because l, np, p, and v exist in a complex (randall et al., 1987; our unpublished data). hn = hemagglutinin-neuraminidase protein, mr ~--70,000; np = nucleoprotein, mr = 61,000; p = phosphoprotein, mr = 44,000; m = matrix protein, mr = 38,000; v = protein v, mr = 24,000; * = polypeptide of unknown origin. nucleotides figure 4 . antigenic region mapping of the p and v monoclonal antibodies on the p and v gene using t7 rna polymerase runoff transcripts, in vitro translation, and immunoprecipitation the full-length p203-1 dna insert cloned using xbal linkers (xx) and a hindlll to xbal fragment (hx) containing the 3' two-thirds of the p203-1 dna were placed under the control of the t7 promoter in the plasmid pgem-2. the template dna was linearized downstream of the t 7 promoter and the insert dna by endonuclease digestion using ecori (xx and hx) or for truncated transcripts, by digestion of the full-length insert with avail (xa) or clal (xc), which cut at sites within the protein coding region. the runoff transcripts were translated in vitro using a rabbit reticulocyte lysate and the [3ss]methionine labeled proteins immunoprecipitated with the p specific monoclonal antibodies using the method described (erickson and blobel, 1979) . (a) the immunoprecipitated in vitro translation products were analyzed by sds-page on 15% gels except for the 4 lanes at the far right, where a 10% polyacryiamide gel was used (see text). u = uninfected cell lysate labeled with tran[35s]label; i = infected cell lysate labeled with tran[35s]label as a marker lane; iv = proteins translated from poly(a)-containing mrna isolated from sv5 infected cells; c = in vitro translation carried out in the absence of added mrna; gp i, gp ii, and gp ill indicate the monoclonal antibody used to immunoprecipitate the proteins observed in the region of the gel defined by the vertical lines either side of the antibody. lanes xx = rna transcribed from full-length cloned dna; lanes hx = rna transcribed from hindlll to xbal 3' two-thirds fragment; lanes xa = rna transcribed from xbal to avail dna fragment; lanes xc = rna transcribed from xbal to clal dna fragment. x = protein p, + = protein v, • = protein consistent in size with internal initiation at met. 183 (210aa product in figure 4b ), 1~ = protein consistent in size with internal initiation at met. 233 (116aa product in figure 4b ), 2~ = protein consistent in size with internal initiation at met. 277, 3~ and 4~ = truncated products with protein synthesis initiated at met. 1 (157aa and 98aa products respectively in figure 4b ). (b) schematic representation of the data accumulated in figure 4a showing the protein products derived from the 0 and +1 reading frames, their relative number of amino acids assuming that all protein products from the +1 reading frame are initiated from internal methionine residues, the restriction endonuclease sites used in the generation of the rnas, and a summary of the protein product reactivity with the gp i, ii, and iii monoclonal antibodies. sition between that of p and v, has not been investigated. to examine the relative ability by which the p and v proteins could be selectively radiolabeled, sv5-infected cells were labeled with either [35s]methionine or [35s]cysteine ( figure 3 , middle section). in addition, cell lysates were immunoprecipitated with the group i antibody (figure 3 , right section). as shown in figure 3 , protein v was easily detected when labeled with either methionine or cysteine, whereas the p protein, although readily labeled with methionine, was poorly labeled with cysteine. these observations suggest that protein v possesses the cysteine-rich region encoded by the 0 reading frame, while the p protein apparently does not. coding to define the regions of the nucleotide s e q u e n c e encoding the p and v proteins, we used the approach of making synthetic m r n a transcripts, translating the rnas in vitro, and immunoprecipitating the products. the complete coding regions of clone p203-1 (nucleotides 44-1285) and a fragment containing the 3' two-thirds of the gene were subcloned into the transcription vector pgem-2 such that they were under the control of the t7 r n a polymerase promoter. a series of mrna-sense runoff transcripts were prepared as described in experimental procedures, translated in vitro using a rabbit reticulocyte lysate, and immunoprecipitated using the group i, ii, and iii monoclonal an-tibodies. the results obtained from such an assay are shown in figure 4a and summarized in schematic form in figure 4b . interestingly, although the p protein could be translated in vitro using poly(a)-containing mrnas isolated from sv5 infected cells ( figure 4a , indicated as x in lanes iv), we were unable to detect the synthesis of the p protein when in vitro runoff transcripts were used to program the cell-free translation system. however, protein v was translated from both synthetic rnas and mrna was isolated from infected cells ( figure 4a , indicated as +). originally we thought it likely that frameshifting might be involved in the synthesis of p. however, because there is no detectable synthesis of the p protein when t7 runoff transcripts are used to program the rabbit reticulocyte lysate, whereas p is translated efficiently in vitro when poly(a) containing mrna from infected cells is used, it would seem unlikely that ribosomal frameshifting is involved in the generation of the p protein. in addition to protein v, other in vitro synthesized proteins were observed, particularly during translation of the 3' two-thirds of the gene. the apparent size of the additional proteins is consistent with initiation of protein synthesis occurring at internal aug codons in the +1 open reading frame: methionines 183 ( figure 4a , closed circle), 233 ( figure 4a , 1--*), and 277 ( figure 4a, 2~) . from the sizes of the different protein products derived from the various runoff transcripts, it was possible to map unambiguously the regions on the open reading frames that were recognized by the monoclonal antibodies. in this way, monoclonal antibodies from group i were found to recognize the n-terminal region of the 0 open reading frame, group ii antibodies to recognize a region from the n terminus of the +1 open reading frame, and group ill antibodies to recognize a c-terminal region of the +1 open reading frame. in this analysis it should be noted that the largest in vitro translation product recognized by group ii and iii antibodies ( figure 4a , indicated by a closed circle) was very similar in size to protein v (210 amino acids versus 221 amino acids) and had an almost identical electrophoretic mobility on sds-page. however, these polypeptides could be resolved when the samples were analyzed on a 10% polyacrylamide gel ( figure 4a , right four lanes). thus, these data indicate that because p and v are immunoprecipitated by the gp i monoclonal antibodies, and v cannot be immunoprecipitated by the gp ii and gp iii monocional antibodies, p and v are amino coterminal and protein v must be the product of the 0 open reading frame. in addition, these data indicate that protein p is derived from amino acid residues encoded by a large part of the +1 reading frame. to obtain additional evidence that protein v is encoded by the 0 open reading frame and that the stop codon (taa, nucleotides 727-729) that terminates translation in the 0 reading frame is not an artifact of the cdna cloning procedure, we used an approach involving site-specific mutagenesis. if this stop codon in used to terminate translation of protein v, its elimination should prevent a normal-sized protein v from being synthesized, and a larger protein of 254 amino acids should be found. nucleotides 727-729 (taa) in the cloned dna were changed to the triplet gcg (encoding alanine) as described in experimental procedures, the dna containing the mutation was transcribed in vitro, and the resulting synthetic rna translated in a rabbit reticulocyte lysate. as shown in figure 5a , lanes 4 and 5, a protein (v*) larger than v ( figure 5a , lane 3) that was recognized by the group i monoclonal antibody (figure 5b, lanes 1 and 2) was synthesized. as no evidence for frameshifting could be obtained, i.e., the inability to translate p in vitro from t 7 transcripts of p203-1 cloned dna, the most plausible mechanism by which p and v are encoded is that a second mrna species that is translated to yield the p protein exists. an insertion or deletion in such a mrna would be expected to occur in the region of overlap between the two reading frames shown in figure 2 . to search for the existence of a second mrna population, nuclease $1 protection anal(lamb and lai, 1982) ; f, control untreated probe; 0, no added mrna; 1-100, increasing mrna concentrations in the ratio 1:5:25:50:100. numbers on the left of each panel are nucleotide sizes. a schematic diagram of the probe protected products, their nucleotide sizes, and the position of the uniquely labeled end, which is indicated by a star is shown beneath the autoradiograms. ysis was performed using poly(a)-containing mrna from sv5 infected cells and two dna fragments from clone p203-1 that spanned the entire region of overlap, one uniquely 3' end-labeled at nucleotide 42 and the other uniquely 5' end-labeled at nucleotide 892. with each fragment, two nuclease $1 protected labeled fragments were detected, the full-length fragment used as a probe corresponding to a colinear mrna transcript, and a smaller fragment present in 10%-20% abundance (data not shown). these data suggest that a second mrna species exists that is derived from the p and v gene on the sv5 virion rna and that it has a nuclease $1 sensitive site approximately between nucleotides 530 and 560. to define the location of this site more accurately, two shorter dna fragments were used for the nuclease $1 analysis. one dna fragment (n ucleotides 434 to 660) was 3' end-labeled at nucleotide 434 and the other dna fragment (nucleotides 438 to 642) was 5' end-labeled at nucleotide 642. in addition to protection of the probe fragments corresponding to a colinear mrna transcript, both probes protected smaller fragments found in 10%-20% abundance ( figure 6 ). with both the 3' end and 5' end-labeled probes, smaller protected fragments (98 and 92 nucleotides respectively) increased in abundance with increasing concentrations of mrna ( figure 6 ). the size of the protected fragments mapped the region containing the nuclease sl-sensitive region to between nucleotides 532 and 550. a cdna library derived from sv5-infected cv1 cell mrnas (paterson et al., 1984) was screened with an oligo-nucleotide probe to isolate cdna clones specific for the p and v mrnas. the nucleotide sequence of 22 p and v specific cdna clones was obtained over the region of overlap between the 0 and +1 reading frames. in addition to cdna clones having the same sequence as p203-1 (12 clones), a second population of cdna clones was isolated (10 clones) that differed from p203-1 in containing two additional bases between nucleotides 548-551. the nucleotide sequence over the relevant region of a p mrna clone and a v mrna clone is shown in figure 7a . it can be seen that whereas the v cdna (p203-1) has four g residues between nucleotides 548 and 551, the p cdna has six g residues (sections p and v, figure 7a ). although the simplest explanation of the nuclease $1 mapping data was that the p mrna was a noncolinear transcript of the p and v gene containing an 18 nucleotide interrupted region, our retrospective explanation for the data is that nuclease $1 recognized a two nucleotide mismatch. the 3' break point maps precisely to the 4 g region in the v cdna clone, while the 5' site did not. in addition to the inherent inaccuracies in measuring the precise size of dna fragments, it is noted that the region 5' to the 4 g residues at nucleotides 548-551 is at-rich and it may have been sensitive to digestion by the nuclease $1 (hansen et al., 1981) . the two extra g residues cause a switch from the 0 reading frame to the +1 reading frame, and the predicted amino acid sequences are shown in figure 7b . these data indicate that the p mrna has the capacity to encode a polypeptide of 392 amino acids initiating at the aug codon at nucleotides 61-63 and terminating at the tga codon at nucleotides 1237-1239. to determine whether the genomic virion rna from which the sv5 mrnas are transcribed contains four or six c residues complementary to the four or six g residues found in the two mrnas, the sequence of the virion rna (vrna) was obtained as described in experimental procedures. as shown in figure 7 (section vrna), only a single cdna sequence could be detected, and it contained four g residues complementary to four c residues in the vrna. to provide further evidence that the only difference between a p mrna and a v mrna is the presence of two nontemplated g residues, we investigated whether the p protein could be translated from a synthetic rna derived from a p cdna. to facilitate the genetic manipulation, an internal large restriction fragment spanning the region of interest in the p203-1 pgem-2 vector was replaced with the comparable fragment from the p cdna clone. in addition to using a "natural" p cdna, we also changed the v cdna clone p203-1 by site-specific mutagenesis to insert two additional g residues into the four g residues at nucleotides 548-551. synthetic rna was transcribed from both the "natural" and the "synthetic" p cdna clones using t7 rna polymerase, and translated in vitro in rabbit reticulocyte lysates. as shown in figure 8 , both the "natural" and "synthetic" p cdna clones yielded a p protein with an electrophoretic mobility identical to that of the p protein synthesized in infected cells and to the p protein translated in vitro from sv5-infected cell mrna. all the p the nucleotide sequences of a p cdna clone and a v cdna clone in the region of nucleotides 541-564 are shown to illustrate the six g or four g residues in the p cdna and v cdna respectively. sequencing was done by the chemical cleavage method (maxam and gilbert, 1980) . the sequence of the sv5 genomic template rna (vrna) is shown in the message sense as determined by dideoxy primer extension sequencing using reverse transcriptase (air, 1979) . (b) the predicted amino acid sequence of the p and v proteins in the region of the six g or four g residues. figure 8 . expression of the p protein from in vitro synthesized rna a p cdna clone was reconstructed in the p203-1 pgem2 vector by replacing an internal large psti-dna fragment (nucleotides 225-660) with that from a p cdna containing the two nontemplated g residues between nucleotides 548-551. the p203-1 dna was also changed by site-specific mutagenesis to insert two additional g residues into the four g residues at nucleotides 548-551, and the mutated dna subcloned into the pgem-2 vector. rna was transcribed with t7 rna polymerase from both the "natural" and the "synthetic" p cdna clones and translated in vitro using rabbit reticulocyte lysates. lane 1 = sv5infected cv1 cell lysate as a marker. in vitro translated rnas were as follows: lane 2 = no rna control; lane 3 = poly(a)-containing mrnas from sv5-infected cv1 cells; lanes 4 and 5 = "synthetic" p rna from site-specifically mutated template dnas; lane 6 = "natural" p rna; lane 7 = v rna from clone p203-1. dashes = proteins p and v. arrowhead and dot indicate protein products thought to originate from initiation at internal methionine residues 141 and 183 respectively. proteins could be immunoprecipitated with the group i, ii, and iii monoclonal antibodies (data not shown). the protein products found in figure 8 , lanes 4-6, indicated by an arrow and a dot, are thought to be internal initiation products from methionine residues 141 and 183 respectively. protein v (figure 8, lanes 3 and 7) is of a slightly different electrophoretic mobility from the smaller internal initiation product, and only protein v and not the internal initiation products are precipitated by the group i monoclonal antibodies (data not shown). the finding of two extra g residues at a precise location in the p mrna suggests that a signal would be needed to specify their addition, such as a region of strong secondary structure in the vrna or mrna. with the aid of the computer program fold (intelligenetics inc., palo alto, ca), the most stable secondary structure that can be predicted for nucleotides 520-620 of the p and v gene is one with an energy of •g = -53.7 kcal/mol and has the four templated c residues (nucleotides 548-551) immediately after a base-paired stem region (figure 9 ). the klenow fragment of e. coil dna polymerase often yields artifactual sequencing bands at a run of several g residues when directly sequencing double-stranded dna using the dideoxy chain-terminating method. as shown in figure 9 the sequence of nucleotides 548-551 in the v clone (4 g residues) is easier to interpret than nucleotides 548-553 in the p clone (6 g residues). these artifactual bands can be eliminated when the sequencing is performed with a modified form of t7 dna polymerase (sequenase tm) in conjunction with dltp instead of dgtp, unless there is a strong secondary structure in the template strand and then the artifacts are exacerbated (tabor, 1987) . when this was done for the p clone dna ( left: the sv5 virion rna sequence from nucleotides 520-620 of the p/v gene was examined for regions of strong secondary structure with the aid of the computer program fold (inteiligenetics inc., pals alto, ca). the stemloop structure shown has an energy of ag = -53.7 kcal/mol. the four c residues at nucleotides 548-551 are boxed. the arrow denotes the direction of mrna transcription. right: nucleotide sequences obtained by the dideoxynucleotide chain-terminating method using the klenow fragment of e. coil dna polymerase or a modified form of t7 dna polymerase (sequenase tm) on a p clone cdna template (klenow and tt) or a v clone cdna template (klenow). in the sequenase reactions (t7) dltp was used in place of the usual dgtp. the region of the four or six g residues between nucleotides 548-551 is indicated by a star. 9) or v clone dna (data not shown), the t7 dna poiymerase nearly stopped its processive synthesis at nucleotides 543-550, which suggests that there is a native secondary structure in this region. we have obtained the nucleotide sequence of the paramyxovirus sv5 p and v gene and have determined the strategy by which both proteins are expressed by a single gene. the p and v proteins are translated from two independent mrnas that are synthesized in sv5 infected cells and are found to differ by the presence in the p mrna of two additional nucleotides. a comparison of the nucleotide sequences of the p and v cdnas and the sv5 genomic vrna showed that the two additional g residues present in the p mrna are not templated by the sv5 virion rna (figure 7 ). it could be argued that the vrna sequencing might not detect a minor vrna species of less than 5% abundance. however, there is no biological evidence for the involvement of more than one virus genome in the sv5 infectious cycle. using a combination of in vitro translation of t7 runoff transcripts, immunoprecipitation of the in vitro synthesized proteins using monoclonal antibodies, oligonuc!eotide-directed mutagenesis, and metabolic labeling of sv5 infected cell proteins using specific amino acids ([35s]methionine or [35s]cysteine), we have shown that p and v are amino coterminal proteins that have different c-termini. the results presented here confirm earlier observations that the p and v proteins of sv5 have tryptic peptides in common (paterson et al., 1984) . thus sv5 differs from many paramyxoviruses and morbilliviruses that use functionally bicistronic mrnas to synthesize the p protein, and a second protein known as c from overlapping reading frames (giorgi et al., 1983; galinski, et al., 1986; bellini et al., 1985; barrett et al., 1985) . early peptide mapping data obtained for the p and "c-like" proteins of two other paramyxoviruses, newcastle disease virus (ndv) and mumps virus, suggested that both proteins are encoded by the same reading frame (collins et al., 1982; herrler and compans, 1982) . recently the ndv and mumps virus p genes have been sequenced and found to contain one open reading frame (sato et al., 1987; mcginnes et al., 1988; takeuchi et al., 1988) from which it has been suggested that both the p and "c-like" proteins are derived, with the "c-like" protein arising from initiation at an internal aug codon (mcginnes et al., 1988) . sv5 is therefore seemingly unique among paramyxoviruses in having two mrnas transcribed from the p gene. the rna-dependent rna poiymerase of negative strand rna viruses functions as part of a transcriptase complex composed of the template vrna in tight association with the nucleoprotein (np), and the p and l proteins, which are thought to be responsible for the polymerase activity (buetti and choppin, 1977; hamaguchi et al., 1983) . transcription of the virus-specific mrnas by the transcriptase complex is believed to occur entirely in the cytoplasm of infected cells and is independent of host-cell mrna synthesis. the mechanism responsible for the addition of the untemplated g residues present in the p mrna is unknown, nor is it known whether it is a cotranscriptional or posttranscriptional process. however, the virus-encoded rna polymerase of negative strand rna viruses is also responsible for the polyadenylation of virus-specific mrnas, a process that is thought to occur by a "slippage" or "stuttering" mechanism involving the reiterative copying by the polymerase of a stretch of u residues located at the end of each gene. as the nontemplated g residues are added to the p transcript at a position where the template vrna has a run of four c residues, it is possible that the sv5 polymerase "stutters" while copying this region of the genome and thus adds the nontemplated nucleotides. it is interesting that immediately upstream of the four c residues on the $v5 genomic rna is the sequence the published nucleotide sequences of the p genes of several paramyxoviruses and the morbilivirus, measles virus, were translated in all three reading frames. in each case of a reading frame overlapping that for the p protein a cysteine-rich region was identified and is listed in the single letter amino acid code. only the region of significant conservation of sequence is shown with its corresponding nucleotide number; the n-terminal region of the open reading frame is omitted. the star at the end of the amino acid sequence represents a translation termination codon. the boxes identify positions where three or more amino acids have been conserved in all six viruses, a dash indicates that a gap was placed in the alignment, and the star above the sequences shows the seven conserved cysteine residues. sources for the p gene nucleotide sequences are as follows: sv5, this publication; mumps virus, takeuchi et al., 1988; ndv, sato et al., 1987; sendai virus, shioda et al., 1983 and giorgi et al., 1983; parainfluenza virus 3 (pi-3), galinski et al., 1986, and luk et al., 1986; measles virus, bellini et al., 1985. 3'-aaaauucu-5' (figure 9 ), which resembles the putative polyadenylation signal found at the end of sv5 genes and in fact is identical to the sequence at the end of the sv5 hn gene (hiebert et al., 1985) , making this an attractive model for the mechanism by which the nontemplated gs are added. however, it cannot be ruled out that the nontemplated g residues in the p mrna are added as a consequence of some form of rna-editing analogous to that found in mitochondrial transcripts in trypanosomes (benne et al., 1986; feagin et al., 1987 feagin et al., , 1988 shaw et al., 1988) or the mammalian apolipoprotein-b mrna (powell et al., 1987; chen et al., 1987) . while screening the sv5 cdna library for a p cdna, 22 clones were sequenced across the region described above and only clones with either four or six g residues were found. this would suggest that whatever the mechanism involved in the addition of the nontemplated g residues, it is extremely specific. with the aid of the computer algorithm fold, a region of secondary structure was predicted for this part of the template rna and it is therefore possible that this could play a role in either the mechanism itself or its regulation (figure 9 ). an examination of the predicted amino acid sequences of the p and v proteins reveals several interesting features. as mentioned above, the p and v proteins are amino coterminal and have their first 164 residues in common (figure 1 ). an unusual feature of the shared region is the large number of proline residues; 17 prolines in 164 amino acids (figure 1) . however, the most striking characteristic observed in either protein is the c-terminal portion of protein v, which consists of a cysteine-rich region bearing a remarkable resemblance to cysteine-rich regions found in the adenovirus e1a protein (for review see moran and matthews, 1987) , the yeast transcription factor gal4 (johnston and dover, 1987) , and proteins belonging to the steroid hormone receptor superfamily (for review see evans, 1988) . in these proteins and others possessing a similar domain it is thought that the binding of metal ions by the cysteine-rich region plays an important role in either the binding of nucleic acid by the protein, mediat-ing protein-protein interactions, or stabilizing oligomeric forms of a protein, as in the tat protein of human immunodeficiency virus (frankel et al., 1988) . because of the significance of the cysteine-rich regions in other proteins, it was of interest to determine whether the sequence identified here in protein v had been conserved among other paramyxovirus p genes. consequently, we compared the cysteine-rich region from protein v with the protein sequences predicted in all three reading frames from the nucleotide sequence of the p genes from mumps virus, ndv, sendai virus, parainfluenza virus 3, and measles virus (takeuchi et al., 1988; sato et al., 1987; mcginnes et al., 1988; shioda et al., 1983; giorgio et al., 1983; galinski et al., 1986; luk et al., 1986; bellini et al., 1985) . as shown in figure 10 , a highly conserved cysteine-rich region was identified in an open reading frame in all the different paramyxovirus p gene sequences examined. interestingly, the cysteine-rich region is more conserved between the different paramyxoviruses than is the amino acid sequence of the p protein encoded by the same nucleotides but translated in another reading frame (data not shown). as the p protein is part of the paramyxovirus transcriptase complex, the conservation of the cysteine-rich region must have important biological significance. it will be important to determine whether a protein containing this cysteine-rich region is synthesized in cells infected with other paramyxoviruses in addition to the already identified p or p and c proteins derived from the "p" gene. the function of protein v has yet to be elucidated. however, as v is found associated with purified sv5 virions (our unpublished data) and as group i antibodies precipitate l, np, p, and v in a complex (randall et al., 1987; our unpublished data) , it remains a possibility that protein v may play a role in transcription and/or replication of the virus genome in infected cells. monolayer cultures of a variant of the mdbk line of bovine kidney cells and the tc7 clone of cv-1 cells were grown in dulbecco's modified eagle's medium (dmem) supplemented with 10% fetal calf serum. stock virus was grown in mdbk cells infected with the w3 strain of sv5 (choppin, 1964) as described previously (peluso et al., 1977) . for all biochemical experiments, cv-1 cells were used and infected as described previously (paterson et al., 1984) , except that for metabolic labeling of infected cell proteins, monelayers were incubated in methionine-and cysteine-free dmem and proteins labeled using either tran[35s]label (icn radiochemicals, irvine, ca), 135s]cysteine or [35s]methionine (amersham corp., arlington heights, il). messenger rnas were isolated as described previously (paterson et al., 1984) . cdna synthesis, isolation of sv5 specific clones, and the identification of cdna encoding the various viral gene products has been described (paterson et al., 1984) . three clones, p10, p27, and p203-1 were sequenced over their entire length both by the chemical cleavage method (maxam and gilbert, 1980) and after subcloning into the pstl site of the replicative form of bacteriophage m13mp19, by the dideoxy chain-termination method (sanger et al., 1977) . dideoxy primer extension sequencing on purified sv5 genomic rna and poly(a)-containing mrna was performed using avian myeloblastosis virus reverse transcriptase (molecular genetic resources, tampa, fl) and p gene specific primers as described previously (air, 1979) . direct sequencing of double-stranded plasmid dna was carried out by the dideoxy chaintermination method using the klenow fragment of e. coil dna polymerase (bethesda research laboratories, gaithersberg, md) as described by sanger et al. (1977) or a modified form of t7 polymerase (sequenase tm, united states biochemical corp., cleveland, oh) according to the manufacturer's instructions. restriction endonucleases, bacterial alkaline phosphatase, and t4 dna ligase were obtained from bethesda research laboratories, and t4 polynucleotide kinase from pharmacia fine chemicals (piscataway, nj). oligonucleotides were synthesized by the northwestern university biotechnology facility on an applied biosystems (foster city, ca) model 380b dna synthesizer and were purified as described (paterson and lamb, 1987) . the p203-1 cdna was excised from pbr322 by hhal and mstll digestion, thereby eliminating the g/c tails introduced during cdna cloning; xbal linkers were added and the cdna subcloned into the xbai site of pgem-2 (promega biotec, madison, wi). deletion of the 5' end of the gene was performed by digesting pgem-2 containing the p203-1 cdna with xbai and hindlll, isolating the 3' portion of the gene, addition of xbal linkers, and subcloning back into pgem-2. to construct both the protein v stop codon elimination mutant and the frameshift mutant, p203-1 cdna was subcloned into the xbal site of the replicative form of bacteriophage m13mp19, oligonucleotide-directed mutagenesis was carried out according to the procedure of zoller and smith (1982) using mutagenic oligonucleotides consisting of 12 nucleotides either side of the site of the mutation. dna containing the desired mutation was subcloned into pgem-2 and the mutation verified by direct plasmid dna sequencing using the dideoxy chain termination method (sanger et al., 1977) and a p specific oligonucleotide primer. for transcription of the entire coding region, plasmid dnas were linearized downstream of the t7 promoter and the p or v insert, using ecori. for the synthesis of truncated forms of the rnrna, the dna template was linearized using either avail or clal, which recognize sites within the coding region of the cdna. in vitro synthesis of mrna was carried out as described previously (hull et al., 1988) and 1 ~g of rna was used to program a rabbit reticulocyte tysate as described below. t7 dna-dependent rna polymerase was obtained from bethesda research laboratories, rnasin tm and rq dnase tm from promega biotec, and 7r"g(5~)ppp(59g (sodium salt) was from pharmacia fine chemicals. in vitro translation of mrnas mrnas were translated in vitro using a micrococcal nuclease-treated rabbit reticulocyte lysate (promega biotec) according to the manufacturer's instructions. the in vitro-synthesized products were labeled using [35s]methionine. one-fifth volume of each translation reaction was immunoprecipitated as described below. immunoprecipitation was performed as previously described (lamb et al., 1978; erickson and blobel, 1979) using monoclonal antibodies to the p and v proteins kindly provided by dr. rick randall (randall et al., 1987) . samples were prepared for electrophoresis and analyzed by sds-page on 15% polyacrylamide gels as previously described (lamb et al., 1978) . poly(a)-containing mrnas from sv5 infected cv-1 cells were isolated as described (paterson et al., 1984) . to determine whether more than one mrna is transcribed from the p gene nuclease, sl analysis was performed as previously described (lamb and lai, 1982) . the labeled dna fragments used as probes were: a hhai-avall fragment and a bamhi-pstl dna fragment (nucleotides 42-889 and 434-660, respectively) 3' uniquely labeled at nucleotides 42 and 434, and a hhai-avall fragment and a bamhi-hphl fragment (nucleotides 42-892 and 438-842, respectively) 5' uniquely labeled at nucleotides 892 and 642. nuclease $1 was obtained from boehringer mannheim biochemicals, indianapolis, in. nucleotide sequence coding for the "signal peptide" and n terminus of the hemagglutinin from an asian (h2n2) strain of influenza virus binding of mammalian ribosomes to ms2 phage rna reveals an overlapping gene encoding a lysis function nucleotide sequence of the entire protein coding region of canine distemper virus polymerase-associated (p) protein mrna measles virus p gene codes for two proteins major transcript of the frarneshift coxll gene from trypanosome mitochondria contains four nucleotides that are not encoded in the dna vaccin(a virus produces late mrnas by discontinuous synthesis the 2.2 kb elb rnrna of human ad12 and ad5 codes for two tumor antigens starting at different aug triplets completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes influenza b virus genome: sequences and structural organization of rna segment 8 and the mrnas coding for the nsi and ns2 proteins an efficient ribosome frarneshifting signal in the polymerase-encoding region of the coronavirus ibv the transcriptase complex of the paramyxovirus sv5 apolipoprotein b-48 is the product of a messenger rna with an organ-specific in-frame stop codon multiplication of a myxovirus (sv5) with minimal cytopathic effects and without interference coding assignments of the five smaller m rnas of newcastle disease virus ribosomal initiation at alternate augs on the sendal virus pic mrna early events in the biosynthesis of the lysosomal enzyme cathepsin the steroid and thyroid hormone receptor superfamily extensive editing of the cytochrome c oxidase iii transcript in trypanosoma brucei developmentally regulated addition of nucleotides within the apocytochrome b transcripts in trypanosoma brucei tat protein from human immunodeficiency virus forms a metal-linked dimer molecular cloning and sequence analysis of the human parainfluenza 3 virus mrna encoding the p and c proteins sendal virus contains overlapping genes expressed from a single mrna translational modulation in vitro of a eukaryotic viral mrna encoding overlapping genes: ribosome scanning and potential roles of conformational changes in the p/c mrna of sendal virus transcriptive complex of newcastle disease virus. i. both l and p proteins are required to constitute an active complex t antigen repression of sv40 early transcription from two promoters synthesis of mumps virus polypeptides in infected vero cells hemagglutininneuraminidase protein of the paramyxovirus simian virus 5; nucleotide sequence of the mrna predicts an n-terminal membrane anchor integration of a small integral membrane protein, m2, of influenza virus into the endoplasmic reticulum: analysis of the internal signal-anchor domain of a protein with an ectoplasmic nh2 terminus expression of the rous sarcoma virus pol gene by ribosomal frameshifting two efficient ribosomal frameshifting events are required for synthesis of mouse mammary tumour virus gag-related polyproteins mutations that inactivate a yeast transcriptional regulatory protein cluster in an evolutionary conserved dna binding domain evidence for in vivo trans splicing of pre-mrnas in tobacco chloroplasts trans splicing of mrna precursors in vitro a trans-spliced leader sequence on actin mrna in c. elegans sequence of interrupted and uninterrupted mrnas and cloned dna coding for the two overlapping nonstructural proteins of influenza virus spliced and unspliced messenger rnas synthesized from cloned influenza virus m dna in an sv40 vector: expression of the influenza virus membrane protein (m1) evidence for a ninth influenza viral polypeptide sequences of mrnas derived from genome rna segment 7 of influenza virus: colinear and interrupted mrnas code for overlapping proteins messenger rna encoding the phosphoprotein (p) gene of human parainfluenza virus 3 is bicistronic sequencing end-labeled dna with base-specific chemical cleavages the p protein and the non-structural 38k and 29k proteins of newcastle disease virus are derived from the same open reading frame multiple functional domains in the adenovirus e1a gene identification of a novel y branch structure as an intermediate in trypanosome mrna processing: evidence for trans splicing splicing of messenger rna precursors ability of the hydrophobic fusion-related external domain of a paramyxovirus f protein to act as a membrane anchor analysis and gene assignment of mrnas of a paramyxovirus, simian virus 5 polypeptide synthesis in simian virus 5 infected cells a novel form of tissue-specific rna processing produces apolipoprotein-b48 in intestine isolation and characterization of monoclonal antibodies to simian virus 5 and their use in revealing antigenic differences between human, canine and simian isolates dna sequencing with chain-terminating inhibitors molecular cloning and nucleotide sequence of p, m and f genes of newcastle disease virus avirulent strain d26 vaccinia virus late transcripts generated in vitro have a poly(a) head discontinuous transcription or rna processing of vaccinia virus late messengers results in a 5' poly(a) leader a previously unrecognized influenza b virus glycoprotein from a bicistronic mrna that also encodes the viral neuraminidase editing of kinetoplastid mitochondrial mrnas by uridine addition and deletion generates conserved amino acid sequences and aug initiation codons sequence of 3,687 nucleotides from the 3' end of sendai virus genome rna and the predicted amino acid sequences of viral np, p and c protein trans splicing of mrna precursors sequence analysis of the p and c protein genes of human parainfluenza virus type 3: patterns of amino acid sequence homology among paramyxovirus proteins sequence coding for the alphavirus nonstructural proteins is interrupted by an opal termination codon evidence for trans splicing in trypanosomes sequenasetm: step-by-step protocols for dna sequencing with sequenase tm. united states biochemical corporation molecular cloning and sequence analysis of mumps virus gene encoding the p protein: mumps virus p gene is monocistronic murine leukemia virus protease is encoded by the gag-pol gene and is synthesized through suppression of an amber termination codon oligonucleotide-directed mutagenesis using m13-derived vectors: an efficient and general procedure for the production of point mutations in any fragment we thank margaret a. shaughnessy for excellent technical assistance and rick e. randall of st. andrews university, st. andrews, scotland for kindly providing the monoclonal antibodies to p and v. this research was supported by national institutes of health research grants ai-23173 and ai-20201. during the course of this work, r. a. l was an established investigator of the american heart association.the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 16 u.s.c. section 1734 solely to indicate this fact. key: cord-295062-8rl4kswe authors: marsh, mark; helenius, ari title: virus entry: open sesame date: 2006-02-24 journal: cell doi: 10.1016/j.cell.2006.02.007 sha: doc_id: 295062 cord_uid: 8rl4kswe detailed information about the replication cycle of viruses and their interactions with host organisms is required to develop strategies to stop them. cell biology studies, live-cell imaging, and systems biology have started to illuminate the multiple and subtly different pathways that animal viruses use to enter host cells. these insights are revolutionizing our understanding of endocytosis and the movement of vesicles within cells. in addition, such insights reveal new targets for attacking viruses before they can usurp the host-cell machinery for replication. viruses are obligatory intracellular parasites; therefore, their replication (and the pathogenic consequences of infection) depends critically on the ability to transmit their genomes from infected to noninfected host organisms and from infected to uninfected cells. the small size, structural simplicity, and lack of any metabolic or motile activities severely limit the types of processes that virus particles can themselves undertake to promote the transfer process. as passive, inert particles, they have evolved to exploit the behavior and the physiology of their hosts. at the cell level, this is manifested in the activation of endogenous cellular responses that provide assistance to viruses so that they can cross membranes and other barriers and deliver their genes into the cytosol or the nucleus. recent studies indicate that cells offer a variety of endocytosis, trafficking, and sorting mechanisms that animal viruses can take. moreover, viruses have become valuable tools to study some of these processes. here, we review the general concepts in virus entry and discuss some of the emerging issues. virus particles as devices for targeted gene transfer a viral particle is composed of nucleic acids (rna or dna), protein, and, in the case of enveloped viruses, membrane lipids. the proteins include structural components, such as capsid proteins; matrix proteins; membrane glycoproteins; and, in many cases, accessory proteins such as reverse transcriptases, rna polymerases, kinases, and proteases. one or more shells of protein protect the rna or dna genome in the so-called capsid, which is often helical or icosahedral. enveloped viruses have, in addition, a lipid bilayer membrane that serves to protect the capsid and the genome and operates as a ''transport vesicle'' during cell-to-cell transmission. transmission involves three main stages: the assembly of virus particles in infected cells, their release to the extracellular space, and entry into a new cell. nonenveloped virus particles and the capsids of enveloped viruses are assembled in the cytosol or the nucleus of the infected cell. the lipid bilayer membrane (envelope) of enveloped viruses is acquired during a budding process through a cellular membrane. with few exceptions, the key proteins in the capsid and the envelope are encoded by the viral genome. given that the components of the virus particle are often synthesized in different parts of the cell, the assembly of a particle is a remarkable example of coordinated molecular sorting. the assembly processes involve the establishment of a complex network of specific interactions that bring the relevant viral components together into a particle with precise stoichiometry and geometry and with the exclusion of most cellular components. that all this is possible with the limited genetic information contained in viral genomes is impressive. it serves as a testament to the viruses' ingenious structural designs and the generous assistance of the cell. enveloped particles leave the infected cell inconspicuously by budding and secretion. nonenveloped viruses are usually thought to undergo release through cell lysis, but some may escape by secretory mechanisms after budding into membrane bound compartments and then losing their membrane (altenburg et al., 1980) . others may subvert cellular autophagy pathways to gain access to exocytic organelles (jackson et al., 2005) . the particles released from cells are stable structures crosslinked by networks of intermolecular interactions. they are resistant to the stresses encountered in the extracellular space during transmission from cell to cell and host to host. however, at the same time, the assembly and maturation program has made sure that the particles can fall apart at a moment's notice during entry into a new cell. the reversibility of assembly is important to liberate the genome in infectious form and release the accessory proteins. many of the stabilizing interactions in the particle must be undone: the envelope must be shed, the capsid opened, and the nucleic acids decondensed. to make this possible, the entire virus particle or specific proteins in it are locked in metastable conformations and poised to undergo major conformational changes when triggered by appropriate cues during entry . the uncoating program is thus built into the virion during assembly, allowing major transformations in the particle without the need for external energy. like assembly, entry and uncoating typically occur in several tightly controlled, consecutive steps. these steps, shown in figure 1 for a virus that enters via endocytosis and moves to the nucleus (e.g., an adeno-or polyomavirus), start with events at the cell surface and end with the decondensation of the genome at the site of replication. as the virus progresses in its entry program, it undergoes changes that lead to events such as penetration, capsid destabilization, and uncoating of the genome. many of these changes result from conformational alterations in metastable viral structures. they are triggered by receptor binding, exposure to low ph, reentry into a reducing environment, enzyme-induced covalent modifications, and other cellular cues (earp et al., 2005; harrison, 2005; hogle, 2002; smith and helenius, 2004) . viruses exploit such signals not only to induce changes in the particle and dissociation of protein subunits but also to coordinate movement from one compartment and location in the cell to another, ensuring that each step in the uncoating program occurs at the right point in the sequence, at the right time, and in the right place. although the incoming viruses depend on cues from the cell, the cell also responds to signals induced by the virus. simply by binding to surface receptors, and perhaps by clustering them, many viruses ''knock at the door'' of the cell by activating cellular signaling cascades. this is often essential because it results in the local activation of ligandtriggered processes that viruses require for entry, such as caveolar/raft endocytosis, clathrin-coat assembly, and actin-cortex dissociation. this aspect of virus entry is complex but increasingly central for understanding the infection process. in order to infect, viruses must first bind to the cell surface. the surface structures that they bind to are of two general types depending on the functional consequences of the interaction. attachment factors serve to bind the particles and thus help to concentrate viruses on the cell surface. such interactions can be relatively nonspecific. often they involve interactions with heparan sulfate or other carbohydrate structures on the cell surface (ugolini et al., 1999; vlasak et al., 2005; young, 2001) . unlike attachment factors, virus receptors actively promote entry. they can do so by initiating conformational changes in the virus particle, by activating signaling pathways, and by promoting endocytic internalization. often the receptors accompany the virus into the cell during endocytic uptake and may then play a role intracellularly in the penetration reaction. given that the interactions are usually highly specific, the presence of receptors determines to a large degree which cell types and species can be infected. these interactions and the molecules involved are of crucial importance for understanding not only the mechanisms of entry but also the biology of infection and pathogenesis. in whether enveloped or nonenveloped, many viruses depend on the host cell's endocytic pathways for entry. they follow a multistep entry and uncoating program that allows them to move from the cell periphery to the perinuclear space. in this example, the virus proceeds to deliver its uncoated genome into the nucleoplasm. the interaction between the virus and the host cell starts with virus binding to attachment factors and receptors on the cell surface, followed by lateral movement of the virus-receptor complexes and the induction of signals that result in the endocytic internalization of the virus particle. after vesicular trafficking and delivery into the lumen of endosomes, caveosomes, or the er, a change in the virus conformation is induced by cellular cues. this alteration results in the penetration of the virus or its capsids through the vacuole membrane into the cytosolic compartment. enveloped viruses use membrane fusion for penetration, whereas nonenveloped viruses induce lysis or pore formation. after targeting and transport along microtubules, the virus or the capsid binds, as in this example, to the nuclear pore complex, undergoes a final conversion, and releases the viral genome into the nucleus. the details in the entry program vary for different viruses and cell types, but many of the key steps shown here are general. recent years, hundreds of attachment factors and receptors for different viruses have been identified, resulting in a large body of valuable information (young, 2001) . the crystal structures of several viruses and viral proteins bound to receptors have also been solved (kwong et al., 1998; rossmann et al., 2002; skehel and wiley, 2000; stewart et al., 2003) . within the scope of this review, it is not possible to discuss the attachment factors and receptors in depth; however, it is important to emphasize that the list encompasses a wide variety of different proteins, lipids, and carbohydrates. included are ion transporters, adhesion factors, signaling proteins, and a variety of other cell-surface receptors. proteoglycans and glycolipids are commonly used as virus receptors or attachment factors (young, 2001) . moreover, influenza viruses and some paramyxoviruses have spike glycoproteins with lectin domains that bind to sialic acid, and the major coat protein of polyomaviruses, vp1, binds to the glycan moiety of specific gangliosides (skehel and wiley, 2000; stehle and harrison, 1997; tsai et al., 2003) . although individual interactions between viruses and their receptors are specific, they are often of low affinity. however, the avidity increase resulting from multiple receptor binding sites on virus particles often guarantees nearly irreversible binding. moreover, multisite binding is likely to cluster receptor proteins, which in turn may activate signaling pathways and/or recruitment to endocytic structures (see below). numerous viruses are known to use more than one type of receptor, either in parallel or in series. the consecutive use of cd4 and a chemokine receptor by the human immunodeficiency virus (hiv type 1) is a well-studied example of the latter. in this case, the two receptors are needed to induce major conformational changes in the hiv envelope protein that initiate membrane fusion (berger et al., 1999) . interestingly, hiv-1 can also bind to glycosylceramides and heparan sulfate, interactions that may facilitate the initial recruitment of virus to susceptible cells (long et al., 1994; ugolini et al., 1999) . moreover, the presence of specific glycosphingolipids in the target cell membrane can enhance cd4/coreceptor-dependent fusion (puri et al., 1998) . viruses, in particular those that undergo rapid mutation, are also known to be able to switch receptors (byrnes and griffin, 2000; klimstra et al., 1998) or adapt to use alternative receptors when the primary receptor is absent (vlasak et al., 2005) . this is one of the risks posed by the current avian influenza epidemic: the avian hemagglutinin glycoprotein may, through mutations, evolve to interact more potently with glycoconjugates on human cells. the entry of coxsackie b virus provides an interesting example of how viruses can exploit the different properties of cell-surface receptors to bring about stepwise entry. in this case, the host cells are epithelial cells that grow in tight monolayers. coxsackie b viruses are human picornaviruses that cause meningitis and myocarditis. they are simple, nonenveloped rna viruses that replicate in the cytosol and do not need low ph for penetration. they share an essential receptor molecule with adenovi-ruses, a glycoprotein called car (the coxsackie and adenovirus receptor) (zhang and bergelson, 2005) . in epithelial cells, car is a component of tight junctions and is therefore not accessible to incoming viruses from the apical side. a recent study addressed the question of how the virus gains access to its receptor, i.e., how it breaches the epithelial barrier (coyne and bergelson, 2006) . the starting point for the study was the observation that many coxsackie b strains interact with a coreceptor present on the apical surface of epithelial cells, the decayaccelerating factor (daf, a gpi-anchored protein) (shieh and bergelson, 2002) . virus binding leads to crosslinking of daf molecules, inclusion into lipid rafts, and activation of the tyrosine kinase c-abl (coyne and bergelson, 2006) . once activated, c-abl was found to activate the rhofamily small gtpase rac, which in turn induced a reorganization of the actin cytoskeleton and promoted transfer of bound viruses from the apical surface to the tight-junction region of the cell. this made it possible for the virus to associate with car and for car to induce a conformational change in the virus particle, an obligatory step in picornavirus uncoating and entry (hogle, 2002) . to release its rna into the cytoplasm, the modified particles were then internalized by caveolar endocytosis activated by phosphorylation of tyr14 in caveolin-1 by fyn, a member of the src family of nonreceptor tyrosine kinases. the detailed route taken by the virus inside the cell remains unclear at this point, but the final release of the viral rna into the cytosol is likely to occur in the endoplasmic reticulum (er). this study elegantly demonstrates how a virus exploits multiple cellular receptors and functions to overcome the obstacles it encounters during entry into cells. in this particular cellular context, the coreceptor is needed to overcome the inaccessibility of the main receptor. as discussed below, the signaling cascades are often important in both entry steps and downstream activation of the cytoskeleton and endocytic machinery. it is apparent that many viruses make use of the cell's signaling pathways during entry. this was first recognized for adenoviruses, which use car as a primary receptor and integrins as coreceptors (li et al., 1998; nemerow and stewart, 1999) . nemerow and coworkers demonstrated that the interaction between adenovirus pentons (protein complexes which together with the hexons make up the adenovirus capsid) and integrins activates phosphatidylinositol 3-kinase (pi(3)k), which in turn activates rac and cdc42, resulting in the polymerization of actin-and clathrin-mediated endocytosis of the virus. activation of many different signaling pathways has since been described with the involvement of a variety of factors, including serine/threonine, tyrosine, and pi kinases; phosphatases; and a variety of small gtpases (including arf, rab, and rho family members) (greber, 2002; pelkmans et al., 2005) . viruses use signaling activities to induce changes in the cell that promote viral entry and early cytoplasmic events, as well as to optimize later processes in the replication cycle. initially, the viruses need to make their presence on the cell surface known so that the cell can launch an endocytic response to bring them in. for example, the internalization of sv40 by caveolar/raft endocytosis is regulated by at least five different kinases . inhibition of tyrosine kinases in particular blocks internalization and dramatically reduces infection (chen and norkin, 1999; pelkmans et al., 2002) . in other cases, a virus may need to induce lateral movement along the membrane. this can be seen very dramatically when viruses such as murine leukemia virus and hiv-1 bind to filopodia and proceed to ''surf'' on the outside of these structures toward the cell body (lehmann et al., 2005) . as discussed above, lateral movement is also needed for coxsackie b viruses to reach the car receptor in epithelial cells, and this movement requires activation of c-abl following virus-induced crosslinking of daf (coyne and bergelson, 2006) . the signals can be generated in several ways. the viruses may activate cellular signaling molecules directly by using them as receptors. this may be why so many viruses bind integrins. viruses may also induce signaling by clustering specific cell-surface proteins or lipids. interestingly, a number of viruses use gpi-anchored proteins and gangliosides, which are only associated with the outer leaflet of the plasma membrane, as their receptors. that this often leads to activation of tyrosine kinases on the cytosolic side may be related to the fact that the gpianchored proteins and gangliosides become lipid-raft associated when clustered (coyne and bergelson, 2006; parton, 1994; parton and richards, 2003; pelkmans et al., 2002; sharma et al., 2004) . being dually acylated, some src-family kinases (though not src itself) are also enriched in lipid-raft microdomains, but on the cytoplasmic surface of the membrane. accordingly, there is increasing evidence that many viruses associate with lipid rafts to initiate intracellular signaling (coyne and bergelson, 2006; damm et al., 2005; ono and freed, 2005) . the transfer of the genome and accessory proteins through the barrier of a cellular membrane into the cytosol is called penetration. for enveloped viruses, penetration involves membrane fusion, and, for nonenveloped viruses, it involves pore formation or membrane lysis. the molecular mechanisms of viral membrane fusion are beginning to be understood in increasingly fine detail but will not be considered here (earp et al., 2005; harrison, 2005; kielian and rey, 2006) . for the most part, the penetration mechanisms of nonenveloped viruses are less well understood. however, recent single-particle cryo-em analysis of polioviruses bound to receptor-containing membranes has started to provide detailed information on the structural changes involved in picornavirus penetration and the organization of poliovirus-induced membrane pores through which the viral rna is believed to enter the cytoplasm (bubeck et al., 2005) . the cellular membrane penetrated during virus entry is either the plasma membrane or the limiting membrane of an intracellular organelle, the lumen of which viruses reach after endocytosis. in all known cases, the viruses or their capsids penetrate first into the cytosol. most rna viruses (though not all) replicate in the cytosol, often in contact with specific organelles (salonen et al., 2005) . with the exception of poxviruses and iridoviruses, dna viruses and their capsids are subsequently transported to the nucleus for replication. it has been known for a long time that certain enveloped viruses such as herpes simplex virus 1 (hsv-1); sendai virus; and many retroviruses, including hiv, have ph-independent fusion proteins and can therefore penetrate into cells by fusing directly with the plasma membrane. it is generally assumed that fusion events at the plasma membrane lead to productive infection, although this is difficult to prove because virus particles are also continuously endocytosed. among nonenveloped viruses, several families, including some picornaviruses (such as polio) and polyomaviruses, do not require low ph for penetration. while many of these are known to require endocytosis for penetration, it has been argued that some may penetrate directly through the plasma membrane, though this remains contentious (hogle, 2002) . however, if we view the whole spectrum of viruses, the majority do need endocytic internalization for penetration and productive infection, most likely because endocytosis offers real advantages. viruses that are ferried into cells inside endocytic vesicles can move deep into the cytoplasm, bypassing many of the barriers associated with the membrane cortex and cytosolic crowding (marsh and bron, 1997) and exploiting the molecular motors that are normally recruited to endocytic vesicles (dohner and sodeik, 2005) . a dependence on low ph for penetration allows viruses to use the decreasing ph of endocytic organelles as a cue to activate the penetration reactions and allows viral escape to the cytoplasm at specific locations or before the virus is delivered to the hydrolytic lysosomes (helenius et al., 1980) . in addition, no viral components remain on the cell surface after penetration for detection by the host's immune defenses. of the endocytic pathways taken by viruses, the most commonly used is the clathrin-mediated endocytic route ( figure 2c and figures 3a and 3b ). it transports incoming viruses together with their receptors into early and late endosomes. clathrin-mediated endocytosis is a continuous process, and, for virus entry, it is usually rapid and efficient (marsh and helenius, 1989) . the incoming viruses are often exposed to the acidic milieu of endosomes within minutes after internalization, and many respond to the ph drop by undergoing changes that lead to penetration. depending on the ph threshold, the site of penetration is either the early (ph 6.5 to 6.0) or the late endosome (ph 6.0 to 5.5). in some cases, such as for ebola virus, sars coronavirus, and the nonenveloped mammalian reoviruses, acidic ph alone is not sufficient to induce fusion, and proteolytic cleavages in viral proteins by acid-dependent endosomal proteases, in particular cathepsins l and b, are needed to trigger the change to the penetration-competent state (chandran et al., 2005; ebert et al., 2002; simmons et al., 2005) . for avian leukosis virus, both interaction of the viral envelope protein with a specific receptor (damico et al., 1998; hernandez et al., 1997) and low ph (mothes et al., 2000) are required for fusion. single-particle tracking of influenza virus in cultured simian kidney epithelial cells shows that, although 60% of the particles enter via clathrin-coated pits, 40% use a clathrin-independent pathway (rust et al., 2004) ( figure 2b) . surprisingly, of those that use the clathrindependent pathway, only 5% associate with preexisting clathrin-coated domains in the membrane; the majority enter via coated pits that assemble underneath the surface bound viral particles. this implies induction of transmembrane signaling by the receptor bound virus particles. a similar induction of coated-pit formation has also been observed for reovirus and semliki forest virus (sfv) (ehrlich et al., 2004; a. vonderheit and a.h., unpublished data) ; whether this represents recruitment of clathrincoat components to the clustered cytoplasmic domains of viral receptor proteins or a more complex signaling cascade leading to clathrin recruitment is unclear. endocytic clathrin-coated vesicles deliver their contents to early endosomes ( figure 2c ). these organelles, though morphologically complex, have usually been considered to be a homogeneous population. however, there is increasing evidence that viruses and other endocytosed cargo are selectively targeted to specific populations of endosomes (kirkham et al., 2005) . for example, the early in mammalian cells, many different mechanisms are available for the endocytic internalization of virus particles. some of these mechanisms, such as clathrin-mediated endocytosis, are ongoing, whereas others, such as caveolae, are ligand and cargo induced. currently, there is evidence for six pathways. (a) macropinocytosis is involved in the entry of adenoviruses. (b) a clathrin-independent pathway from the plasma membrane has been shown to exist for influenza virus and arenaviruses. (c) the clathrin-mediated pathway is the most commonly observed uptake pathway for viruses. the viruses are transported via early endosomes to late endosomes and eventually to lysosomes. (d) the caveolar pathway is one of several closely related, cholesterol-dependent pathways that bring viruses including sv40, coxsackie b, mouse polyoma, and echo 1 to caveosomes, from which many of them continue, by a second vesicle transport step, to the er. (e) a cholesterol-dependent endocytic pathway devoid of clathrin and caveolin-1, used by polyomavirus and sv40. (f) a pathway similar to (d) except dependent on dynamin-2. it is used by echo virus 1. depending on the virus and cell type, penetration reactions occur in five locations: the plasma membrane, early and late endosomes, caveosomes, and the er. note that the additional endocytic mechanism of phagocytosis also operates in many cells but has not as yet been linked to virus entry and is not included here. endosomes that receive the incoming influenza viruses belong to a population that rapidly relocates the viruses, by microtubule-mediated transport, to the perinuclear region, where penetration occurs by fusion (lakadamyali et al., 2003) . sfv fuses soon after delivery to rab5 and eea1-positive early endosomes, but the fused virus is then transported in rab7-positive carriers to late endosomes (vonderheit and helenius, 2005) . there is also a recent observation that vesicular stomatitis virus (vsv) may fuse with the internal membrane vesicles of multivesicular endosomes and that these internal membranes subsequently fuse with the limiting membrane of late endosomes to release the viral rna to the cytoplasm (le blanc et al., 2005) . in this case, penetration would involve two membrane fusion events, the first mediated by the viral envelope protein and the second by an as yet uncharacterized cellular mechanism. interestingly, the endocytic uptake and penetration of anthrax toxin also appears to involve a twostep process, the first involving interaction of the toxin with the internal membranes of a multivesicular endosome and the second a back fusion of these membranes with the endosome-limiting membrane (abrami et al., 2004) . the complexity of the clathrin-mediated endocytic pathway is illustrated by the increasing number of alternative cofactors, adaptors, and tethering proteins that are involved in the formation of clathrin-coated pits and vesicles (robinson, 2004) . using small interfering rna (sirna) silencing screens, more than 90 different kinases have recently been shown to regulate the clathrin-mediated internalization and early steps of infection of hela cells by vsv . these kinases belong to many different classes, including regulators of the cytoskeleton, cell cycle, cell growth, and membrane trafficking. they either increase or decrease the efficiency of vsv entry and early steps in the infection cycle. many of the kinases were shown to have a direct effect on the clathrin-mediated endocytosis of other ligands. that some viruses use clathrin-independent pathways for endocytosis and infection is now widely accepted. the best studied of these are the caveolar/raft pathways first observed for sv40, a simple nonenveloped dna virus (anderson et al., 1996; kartenbeck et al., 1989; stang et al., 1997) . the caveolar/raft pathways, of which there are at least three related variants ( figures 2d-2f ), seem to specialize in the internalization of lipids including cholesterol, gpi-anchored proteins, and components of cholesterolrich microdomains (lipid rafts). caveolae are also involved in the transcytosis of serum components in endothelial cells, and many cargo molecules of these pathways seem to have a role in signaling. the capsids of sv40 and the related polyomavirus are composed of 72 homopentameric vp1 protein units that resemble cholera toxin b chain pentamers (stehle et al., 1996; stehle and harrison, 1997) . like cholera toxin, these viruses bind to the sugar moiety of gangliosides and enter cells via caveolar/raft pathways that are dependent on cholesterol ( figures 2d and 2e ) and the activation of tyrosine-kinase signaling cascades (anderson et al., 1996; pelkmans et al., 2001; smith et al., 2003a; stang et al., 1997; tsai et al., 2003) . activation involves tyrosine kinases, pi kinases, and raft lipids. dynamin 2, actin, caveolin-1, and rho gtpases are also involved, depending on the virus and the cell type (pelkmans et al., 2002) . internalization occurs via caveolae that are activated for longdistance transport in the cytosol or via small vesicles that lack caveolin-1 (damm et al., 2005; tagawa et al., 2005) ( figures 3c and 3d) . the caveolar/raft pathway takes the majority of internalized viruses first to ph-neutral organelles in the cytoplasm called caveosomes ( figure 2d ). following a second activation step, the virions are then transported by caveolinfree, microtubule-dependent vesicles trafficking to the er, where penetration occurs (pelkmans et al., 2001) . a recent study suggests that a lumenal er protein, erp29, a thioredoxin homolog without thiol oxidoreductase activity, facilitates penetration of polyomavirus by exposing the hydrophobic, c-terminal arm of vp1 (magnuson et al., 2005) . in the case of sv40, we have found that the redox conditions in the er, as well as two thiol oxidoreductases, pdi and erp57, are important for sv40 uncoating and infection (m. schelhaas, l. pelkmans, and a.h., unpublished data) . since sirna silencing of derlin 1, a protein linked to the reverse translocation of substrates for er-associated protein degradation (erad), inhibits infection, it is likely that components of the erad pathway play a central role in sv40 penetration from the er. interestingly, a similar mechanism may explain how the cholera toxin a chains penetrate into the cytosol (tsai et al., 2002) . rnai silencing screens with the entire human kinome have demonstrated that, in hela cells, the caveolar/raft pathways that mediate sv40 entry depend on some 80 different kinases. the set of kinases show only partial overlap with those that regulate the clathrin-mediated entry of vsv . interestingly, many of the kinases involved in sv40 entry are known to have functions in integrin signaling and actin regulation. other viruses that use caveolar/raft endocytosis include echo 1, a picornavirus that binds to integrins, and coxsackie b in human caco-2 epithelial cells (coyne and bergelson, 2006; pietiainen et al., 2005) . for echo 1 virus, caveolar/raft uptake and entry require signaling that involves protein kinase c, and penetration seems to occur in caveosomes without the involvement of the er (pietiainen et al., 2004; upla et al., 2004) . in addition to these pathways, there is increasing evidence for clathrin-independent pathways of virus entry into the endosomal system ( figure 2b ). such a pathway has been observed for the arenavirus lymphocytic choriomeningitis virus (lcmv) and, as already mentioned, for a fraction of incoming influenza viruses (borrow and oldstone, 1994; rust et al., 2004; sieczkarski and whittaker, 2002) . the molecular mechanisms involved in the pathway (or pathways) are not understood. by contrast, following binding to car and av integrin coreceptors, adenovirus type 2 (ad2) activates an actin-dependent process classified as macropinocytosis (figure 2a) , but virus endocytosis occurs via clathrin-coated pits. inhibition of macropinocytosis inhibits virus entry, but how exactly this endocytic process influences virus penetration is unclear (meier et al., 2002) . although the formation of primary vesicles displays great heterogeneity, the subsequent steps inside the cell involve either endosomes or caveosomes and require cholesterol (imelli et al., 2004) . that these two organelle classes are not entirely independent of each other is shown by their regulation through a small subset of common kinases and by the observation that cargo can be transferred between them . together with bacterial toxins such as cholera, anthrax, and shiga toxin, viruses are emerging as valuable ligands for charting the various ligand-inducible pathways of endocytosis. the spectrum of pathways is often redundant and cell-type specific. differences in sensitivity to inhibitors, expression of dominant-negative mutants, and livecell microscopy show that the pathways differ in their kinetics and their dependence on host-cell kinases; dynamins; rac-, rab-, and arf-family gtpases; actin and tubulin; and cholesterol. the pathways are also likely to be physiologically regulated (e.g., the upregulation of compensatory pathways when clathrin-mediated endocytosis is inhibited; damke et al., 1995) . this may be one reason why viruses such as sv40 and influenza can use several different pathways, enabling them to infect a wide range of cells under various conditions. the question arises as to whether the pathways that have been identified in tissue culture systems, in some cases under unusual experimental conditions, occur in the relevant cells in vivo or whether there are additional endocytic entry mechanisms still to be discovered. it will take some effort to define how many different endocytic pathways operate in cells and their normal functional activities. further analysis using sirna screens, with larger sirna libraries and other viruses, may allow the identification of unique sets of proteins that are required for the entry of viruses that use a specific endocytic mechanism, thus providing a genetic fingerprint for each pathway. such analyses should demonstrate whether these different pathways involve distinct molecular mechanisms or whether overlapping sets of proteins mediate several variants of essentially a single endocytic mechanism. the existence of ph-independent fusion proteins has an important consequence for some viruses. when these proteins are expressed on the cell surface, they can allow fusion of the infected cell with neighboring noninfected cells expressing appropriate receptors. this leads to formation of heterokaryons, or syncytia, and potentially allows for the dissemination of the virus without the formation of mature virus particles. nevertheless, in the majority of cases, the transfer of viral genomes from cell to cell appears to occur through the formation of virus particles that are released from infected cells and use the mechanisms described above to enter new uninfected hosts. interestingly, the herpes virus varicella-zoster (vzv) appears to use cell-free viruses to spread from human to human but direct cell-to-cell transfer, without formation of infectious virions, to spread within an infected host (chen et al., 2004) . most studies of viral entry are, by necessity, conducted in tissue culture systems using cell lines that differ from the cells that are the normal targets for infection in vivo. although these systems have contributed greatly to our understanding of virus entry, they do not fully replicate the in vivo scenario, and, as a consequence, crucial aspects of entry mechanisms may be missed. in experimental systems, the entry stage of infection is usually studied by adding cell-free virus particles to cells. often the efficiency of entry is low, and experimenters frequently enhance virus adsorption by including charged polymers, such as polybrene, in the medium or by gently centrifuging virus particles onto cells (o'doherty et al., 2000) . viruses that have adapted to growing in tissue culture cells can show an enhanced tendency to use proteoglycans for initial recruitment to cell surfaces (de haan et al., 2005; vlasak et al., 2005) . indeed, hiv-1 strains adapted to grow in t cell lines appear to bind to cells via proteoglycans before engaging cd4 and coreceptors (ugolini et al., 1999) . together this suggests that the events involved in virus adsorption and entry in tissue culture may be different from those that occur in vivo. a number of recent studies suggest why this is the case. transmission of viruses from cell to cell in epithelial cells, for example, may occur through the basolateral domains, where a virus released from one cell can be effectively deposited on adjacent cells, minimizing the events associated with virus recruitment to the cell surface. vaccinia virus (vv) provides an extreme example. vv is produced in several forms, of which the intracellular mature virus (imv) and extracellular enveloped virus (eev) are capable of infecting target cells. the eev is initially assembled with three membranes and is released to the cell surface when the outer membrane fuses with the plasma membrane to produce an extracellular double-membraned particle. the eev remains bound to the surface of the producer cell and induces the polymerization of cytoplasmic actin comets that push the virus around on the surface of the cell, often at the tips of filopodium-like projections (smith et al., 2003b; wolffe et al., 1998) (see figure 4 and greber and way, 2006 [this issue of cell] ). in confluent cultures, and presumably in vivo as well, these projections push viruses into neighboring cells and enhance cell-to-cell transfer. deletion of viral proteins involved in transmembrane signaling and actin nucleation reduces the viral plaque size in tissue culture and pathogenesis in vivo (smith et al., 2003b; wolffe et al., 1998) . recently, structures termed infectious or virological synapses have been increasingly implicated in cell-tocell virus transfer. initially described for the transfer of the human t cell leukemia virus type 1 (htlv-1) from infected to uninfected t cells, these areas of intimate contact between the infected and uninfected cells were recognized to have features in common with immunological synapses (igakura et al., 2003) . in contrast to t cell killing or antigen presentation, virological-synapse formation between t cells does not require major histocompatibility antigens (jolly and sattentau, 2004) . the synapse provides a domain where virus assembly can be focused to release particles efficiently to a target cell. similar means of trans-fer have been described for herpes viruses (johnson and huber, 2002) and for the t cell to t cell transfer of hiv (jolly and sattentau, 2004) . a particularly intriguing example of infectious synapse function is that described for dendritic cells. for many years it has been recognized that a highly efficient way of infecting t cells with hiv in culture is to first present the virus to dendritic cells (cameron et al., 1992) . hiv infection of dendritic cells can occur but is generally regarded as being inefficient. recent studies have suggested an alternative mode of interaction whereby viruses may be captured by c type lectins (geijtenbeek et al., 2000; turville et al., 2002) , and perhaps other receptors involved in antigen acquisition, and internalized into endocytic vesicles without infecting the dendritic cell. this virus can be regurgitated when dendritic cells interact with cd4-positive t cells and is able to infect the t cell in trans through infectious or virological synapses-a process that mimics the normal course of antigen presentation (mcdonald et al., 2003) . viruses can be harbored in endocytic vesicles for hours, if not days, without losing infectivity, potentially allowing the dendritic cells in vivo to undergo maturation and relocation from peripheral tissue, where they would initially encounter viruses, to lymphoid organs. the endocytic compartment in which endocytosed viruses are sequestered is morphologically complex and bears some similarity to the compartments in macrophages where hiv assembles (garcia et al., 2005) . although trans-infection has been documented in tissue culture, it remains unclear whether it operates in the same manner in vivo where, for hiv at least, infection of dendritic cells and the production of viruses in infected dendritic cells may be more relevant (turville et al., 2004) . regardless, the transfer of viruses through infectious synapses is likely to operate in both scenarios. similar dendritic-cell-mediated trans-infection routes have been proposed for a number of other viruses, including herpes, filo-, and flaviviruses (van kooyk and geijtenbeek, 2003) . thus, these viruses have adapted to exploit a loophole in the defense strategies ranged against them by effectively using the cell's endocytosis and/or secretory mechanisms to allow their release to be temporally and spatially coordinated to focus infectious viruses on target cells. it is likely that similar strategies are used in various situations by different viruses; in particular, following reactivation of latent viruses, the transfer of hsv and varicella-zoster virus (chicken pox) from infected neurones to epithelial cells is likely to exploit specialized cell junctions (johnson and huber, 2002) . thus, although the full life cycle of a virus requires assembly, dissemination by release from cells, and entry into new targets, the time during which viruses are genuinely cell free may be minimal. these mechanisms may limit the extent to which a virus is exposed to the humoral immune system (johnson and huber, 2002) and may also explain, for example, why, in hiv patients, infected t cells harbor multiple proviral genomes, suggestive of cells having undergone multiple, reasonably synchronous infection events (jung et al., 2002) . many viruses only infect a restricted range of cells, limiting their species tropism and cell tropism. frequently, the restriction on infection is due to the absence of appropriate receptors. with retroviruses, however, a number of other mechanisms to limit viral replication have been discovered, and many of these operate at the level of entry (towers and goff, 2003) . for hiv in particular, several mechanisms have now been described through which virus replication can be limited independently of receptor expression. cellular rna-editing enzymes can be packaged into virus particles and can introduce mutations in the viral genome during the reverse-transcription event that occurs shortly after virus entry. hiv-1 uses its vif protein to overcome this restriction by preventing packaging of the editing enzymes into virus particles (sheehy et al., 2002) . in another example, the so-called lv1 (or ref1) restriction that prevents hiv-1 replication in old world monkeys is mediated by a homo-oligomeric protein called trim 5a (stremlau et al., 2004; towers, 2005) . trim 5a appears to bind to an exposed loop on the capsid protein that becomes available when the capsid is released to the cytoplasm after fusion (ylinen et al., 2005) . trim 5a appears to prevent uncoating of the incoming particles by crosslinking the capsid protein subunits, though in some cases it may operate after reverse transcription (ylinen et al., 2005) . the fv1 restriction seen with murine leukemia viruses operates after reverse transcription and prevents nuclear entry (towers et al., 2000) . interestingly, the fv1 restriction maps to an endogenous retroviral gag gene (best et al., 1996) , suggesting that interaction between incoming capsids and an endogenous gag protein prevents infection. in another form of restriction, now termed lv2, some strains of hiv-1 and hiv-2 fail to replicate efficiently in hela cells. the phenotype maps to specific amino acids in both env and capsids. for the env-mediated restriction, at least, the failure to efficiently infect cells appears to be related to virus delivery to a raft-mediated endocytic pathway, and experimental manipulations that disrupt raftmediated endocytosis overcome the restriction, as does diverting the virus to a clathrin-mediated pathway by pseudotyping the virus with the vsv-g protein (marchant et al., 2005; schmitz et al., 2004) . thus, not only must viruses find their way to the correct pathway for infection, but diversion to an alternative pathway, or blind alley, can limit the potential for subsequent infection and replication. other mechanisms through which the mode of penetration determines the outcome of subsequent events in virus entry appear to operate for other related viruses (kim et al., 2001) . among the many challenges for the future is the need to identify exactly how many endocytic mechanisms contribute to virus entry and to determine which viruses use these pathways. rnai screens similar to those described by pelkmans et al. (2005) will allow a systematic approach to these questions with the possibility that specific routes can be defined by the requirement for a unique set of genes and their encoded proteins. thus, each endocytic mechanism might be given a unique genetic fingerprint. understanding these processes may allow specific cellular pathways or molecular machines to be targeted pharmacologically to inhibit the entry of any virus that uses the route for infection. such an approach may offer the advantage that it will be more difficult for a virus to find a way around the block by mutation. it will also be important to show that the pathways identified in tissue culture experiments operate in vivo during a normal infection. such experiments will be difficult, especially with human viruses, but inhibitors identified through tissue culture experiments may make it possible to address these issues. at the level of cells and virus particles, developments in morphological techniques from high-end light microscopy to electron microscopy are allowing the events involved in virus entry to be analyzed with increased spatial and temporal resolution. single-particle tracking of virus particles containing components labeled with different fluorochromes or fluorescent proteins is allowing specific events to be followed in remarkable detail. for some viruses with high infectious-unit-to-particle ratios, such as sfv, such studies provide direct information on entry. more careful attention to the preparation and storage of other viruses may allow similar morphological experiments to be interpreted with the confidence that the events observed represent the behavior of infectious particles. the ability to use microscopes in high-throughput screens for inhibitors of virus entry or to determine key components in the entry pathway will have a far-reaching impact on our understanding of virus entry. together, these studies should provide detailed insights to the cellular mechanisms of endocytosis. the identification of natural ligands will also be key in establishing the biological relevance of these pathways. if we revisit this topic in 5 years' time, we will have a very much more detailed picture of virus entry. viruses will continue to hold crucial cards in our attempts to understand cells and their pathogens. acknowledgments m.m. is supported by the uk medical research council. a.h. is supported by the swiss national research foundation, the european union fifth framework (eurogenedrug), and the eth zurich. we thank annegret pelchen-matthews, stefan moese, peter rottier, varpu marjom㤠ki, urs greber, and thomas kershaw for critical comments on the manuscript, and we apologize to colleagues whose work has not been directly cited due to space limitations. membrane insertion of anthrax protective antigen and cytoplasmic delivery of lethal factor occur at different stages of the endocytic pathway ultrastructural study of rotavirus replication in cultured cells bound simian virus 40 translocates to caveolin-enriched membrane domains, and its entry is inhibited by drugs that selectively disrupt caveolae chemokine receptors as hiv-1 coreceptors: roles in viral entry, tropism, and disease positional cloning of the mouse retrovirus restriction gene fv1 mechanism of lymphocytic choriomeningitis virus entry into cells the structure of the poliovirus 135s cell entry intermediate at 10-angstrom resolution reveals the location of an externalized polypeptide that binds to membranes large-plaque mutants of sindbis virus show reduced binding to heparan sulfate, heightened viremia, and slower clearance from the circulation dendritic cells exposed to human immunodeficiency virus type-1 transmit a vigorous cytopathic infection to cd4+ t cells endosomal proteolysis of the ebola virus glycoprotein is necessary for infection mannose 6-phosphate receptor dependence of varicella zoster virus infection in vitro and in the epidermis during varicella and zoster extracellular simian virus 40 transmits a signal that promotes virus enclosure within caveolae virus-induced abl and fyn kinase signals permit coxsakievirus entry through epithelial tight junctions receptor-triggered membrane association of a model retroviral glycoprotein clathrin-independent pinocytosis is induced in cells overexpressing a temperature-sensitive mutant of dynamin clathrin-and caveolin-1-independent endocytosis: entry of simian virus 40 into cells devoid of caveolae murine coronavirus with an extended host range uses heparan sulfate as an entry receptor the role of the cytoskeleton during viral infection the many mechanisms of viral membrane fusion proteins cathepsin l and cathepsin b mediate reovirus disassembly in murine fibroblast cells endocytosis by random initiation and stabilization of clathrin-coated pits hiv-1 trafficking to the dendritic cell-t-cell infectious synapse uses a pathway of tetraspanin sorting to the immunological synapse dc-sign, a dendritic cell-specific hiv-1-binding protein that enhances trans-infection of t cells signalling in viral entry a superhighway to virus infection mechanism of membrane fusion by viral envelope proteins on the entry of semliki forest virus into bhk-21 cells activation of a retroviral membrane fusion protein: soluble receptor-induced liposome binding of the alsv envelope glycoprotein poliovirus cell entry: common structural themes in viral cell entry pathways spread of htlv-i between lymphocytes by virus-induced polarization of the cytoskeleton cholesterol is required for endocytosis and endosomal escape of adenovirus type 2 subversion of cellular autophagosomal machinery by rna viruses directed egress of animal viruses promotes cell-to-cell spread retroviral spread by induction of virological synapses multiply infected spleen cells in hiv patients endocytosis of simian virus 40 into the endoplasmic reticulum virus membrane-fusion proteins: more than one way to make a hairpin use of helper-free replication-defective simian immunodeficiency virus-based vectors to study macrophage and t tropism: evidence for distinct levels of restriction in primary macrophages and a t-cell line ultrastructural identification of uncoated caveolin-independent early endocytic vehicles adaptation of sindbis virus to bhk cells selects for use of heparan sulfate as an attachment receptor structure of an hiv gp120 envelope glycoprotein in complex with the cd4 receptor and a neutralizing human antibody visualizing infection of individual influenza viruses endosome-to-cytosol transport of viral nucleocapsids actin-and myosin-driven movement of viruses along filopodia precedes their entry into cells adenovirus endocytosis requires actin cytoskeleton reorganization mediated by rho family gtpases characterization of human immunodeficiency virus type 1 gp120 binding to liposomes containing galactosylceramide erp29 triggers a conformational change in polyomavirus to stimulate membrane binding an envelope-determined, ph-independent endocytic route of viral entry determines the susceptibility of human immunodeficiency virus type 1 (hiv-1) and hiv-2 to lv2 restriction sfv infection in cho cells: cell-type specific restrictions to productive virus entry at the cell surface virus entry into animal cells recruitment of hiv and its receptors to dendritic cell-t cell junctions adenovirus triggers macropinocytosis and endosomal leakage together with its clathrin-mediated uptake retroviral entry mediated by receptor priming and low ph triggering of an envelope glycoprotein role of alpha(v) integrins in adenovirus cell entry and gene delivery. microbiol human immunodeficiency virus type 1 spinoculation enhances infection through virus binding role of lipid rafts in virus replication ultrastructural localization of gangliosides; gm1 is concentrated in caveolae lipid rafts and caveolae as portals for endocytosis: new insights and common mechanisms caveolar endocytosis of simian virus 40 reveals a new two-step vesicular-transport pathway to the er local actin polymerization and dynamin recruitment in sv40-induced internalization of caveolae caveolinstabilized membrane domains as multifunctional transport and sorting devices in endocytic membrane traffic genome-wide analysis of human kinases in clathrin-and caveolae/raft-mediated endocytosis echovirus 1 endocytosis into caveosomes requires lipid rafts, dynamin ii, and signaling events viral entry, lipid rafts and caveosomes the neutral glycosphingolipid globotriaosylceramide promotes fusion mediated by a cd4-dependent cxcr4-utilizing hiv type 1 envelope glycoprotein adaptable adaptors for coated vesicles picornavirus-receptor interactions assembly of endocytic machinery around individual influenza viruses during viral entry viral rna replication in association with cellular membranes lv2, a novel postentry restriction, is mediated by both capsid and envelope selective stimulation of caveolar endocytosis by glycosphingolipids and cholesterol isolation of a human gene that inhibits hiv-1 infection and is suppressed by the viral vif protein interaction with decay-accelerating factor facilitates coxsackievirus b infection of polarized epithelial cells influenza virus can enter and infect cells in the absence of clathrin-mediated endocytosis inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry receptor binding and membrane fusion in virus entry: the influenza hemagglutinin how viruses enter animal cells ganglioside-dependent cell attachment and endocytosis of murine polyomavirus-like particles vaccinia virus motility major histocompatibility complex class i molecules mediate association of sv40 with caveolae high-resolution structure of a polyomavirus vp1-oligosaccharide complex: implications for assembly and receptor binding the structure of simian virus 40 refined at 3.1 a resolution virus maturation: dynamics and mechanism of a stabilizing structural transition that leads to infectivity structural basis of nonenveloped virus cell entry the cytoplasmic body component trim5al-pha restricts hiv-1 infection in old world monkeys assembly and trafficking of caveolar domains in the cell: caveolae as stable, cargo-triggered, vesicular transporters a conserved mechanism of retrovirus restriction in mammals control of viral infectivity by tripartite motif proteins. hum post-entry restriction of retroviral infections retro-translocation of proteins from the endoplasmic reticulum into the cytosol gangliosides are receptors for murine polyoma virus and sv40 diversity of receptors binding hiv on dendritic cell subsets immunodeficiency virus uptake, turnover, and 2-phase transfer in human dendritic cells hiv-1 attachment: another look clustering induces a lateral redistribution of alpha 2 beta 1 integrin from membrane rafts to caveolae and subsequent protein kinase c-dependent internalization dc-sign: escape mechanism for pathogens human rhinovirus type 89 variants use heparan sulfate proteoglycan for cell attachment rab7 associates with early endosomes to mediate sorting and transport of semliki forest virus to late endosomes role for the vaccinia virus a36r outer envelope protein in the formation of virus-tipped actin-containing microvilli and cell-to-cell virus spread differential restriction of human immunodeficiency virus type 2 and simian immunodeficiency virus sivmac by trim5alpha alleles virus entry and uncoating adenovirus receptors key: cord-255856-0xqllbzz authors: florman, harvey m.; wassarman, paul m. title: o-linked oligosaccharides of mouse egg zp3 account for its sperm receptor activity date: 1985-05-31 journal: cell doi: 10.1016/0092-8674(85)90084-4 sha: doc_id: 255856 cord_uid: 0xqllbzz abstract previously, we reported that zp3, one of three different glycoproteins present in the mouse egg's zona pellucida, serves as a sperm receptor. furthermore, small glycopeptides derived from egg zp3 retain full sperm receptor activity, suggesting a role for carbohydrate, rather than polypeptide chain in receptor function. here, we report that removal of o-linked oligosaccharides from zp3 destroys its sperm receptor activity, whereas removal of o-linked oligosaccharides has no effect. a specific size class of o-linked oligosaccharides, recovered following mild alkaline hydrolysis and reduction of zp3, is shown to possess sperm receptor activity and to bind to sperm. the results presented strongly suggest that mouse sperm bind to eggs via o-linked oligosaccharides present on zp3. cellular adhesion is central to a range of morphogenetic, differentiative, and homeostatic processes. consequently, considerable effort has been directed towards identification and characterization of species of macromolecules that mediate such associations. it is in this general context that we have studied the interaction between mouse sperm and eggs just prior to fertilization. for fertilization to occur, mammalian sperm must penetrate the egg's extracellular coat, the zona pellucida. in the mouse, this coat is about 7 pm thick and consists of three different glycoproteins, zpl, zp2, and zp3, that are coordinately synthesized and secreted by growing oocytes (bleil and wassarman, 1980b, 198oc; greve et al., 1982; salzmann et al., 1983; roller and wassarman, 1983, shimizu et al., 1983; wassarman et al., 1984a; greve and wassarman, 1985) . to penetrate the zona pellucida, sperm first bind to its outer margin. subsequently, a secretory response, the acrosome reaction, is triggered, enabling passage of sperm through the extracellular coat. finally, sperm contact and fuse with the egg's plasma membrane (gwatkin, 1977; saling and storey, 1979; wassarman and bleil, 1982; florman and storey, 1982; bleil and wassarman, 1983; . several lines of evidence suggest that specific sperm receptors are present in zonae pellucidae and are necessary mediators of binding, the initial phase of gamete interaction (gwatkin, 1977; gulyas and schmell, 1981; yanagimachi, 1981; wassarman and bleil, 1982; schmell et al., 1983; wassarman et al., 1984b wassarman et al., , 1985 . when constituents of mouse egg zonae pellucidae were assayed individually for sperm receptor activity in vitro, zp3 alone.was found to be functional wassarman, 1980a, 1983; wassarman and bleil, 1982) . this glycoprotein (83 kd) consists of a 44 kd molecular weight polypeptide chain, to which 3 or 4 n-linked oligosaccharides are added (salzmann et al., 1983; wassarman et al., 1984a) . a variety of circumstantial evidence suggests that zp3 also contains o-linked oligosaccharides (wassarman et al., 1984a) . although embryo zonae pellucidae also contain zp3, the glycoprotein does not have sperm receptor activity wassarman, 1980a, 1983; wassarman and bleil, 1982) . this behavior of zp3 from mouse eggs and embryos is consistent with the fact that sperm bind to eggs, but not to embryos (gwatkin, 1977; yanagimachi, 1981; . recently, we reported that small glycopeptides derived from egg zp3 retain full sperm receptor activity (florman et al., 1984) . this, as well as other observations, suggests that the sperm receptor activity of zp3 is attributable to its carbohydrate components, rather than to its polypeptide chain. here, we describe results of experiments that examine directly the role of carbohydrate in zp3 function. these results strongly suggest that o-linked oligosaccharides are present on zp3 and are essential for its sperm receptor activity. a preliminary account of some of these results has appeared (florman and wassarman, 1983) . rationale the mouse egg's zona pellucida consists of three different glycoproteins, designated zpl, (200 kd), zp2 (120 kd), and zp3 (83 kd) (bleil and wassarman, 198oc) . previously, we demonstrated that only zp3 exhibits sperm receptor activity in an in vitro competition assay, and it accounts for virtually all sperm receptor activity present in egg zonae pellucidae (bleil and wassarman, 1980a) . furthermore, we found that even relatively small glycopeptides (1.5-6 kd) derived from zp3, following extensive pronase digestion, exhibit full sperm receptor activity in vitro (florman et al., 1984) . these and other observations (wassarman et al., 1984b) suggest that the sperm receptor activity of egg zp3 is dependent on its carbohydrate components rather than on polypeptide chain. to demonstrate this directly, and to identify the class of oligosaccharides involved, the experiments described here were carried out. in these experiments, a competition assay (bleil and wassarman, 1980a; florman et al., 1984) was used to determine the ability of oligosaccharides, derived from either solubilized zonae pellucidae or purified zona pellucida glycoproteins, to inhibit the binding of sperm to eggs in vitro ("sperm receptor activity"). acid on sperm receptor activity the role of carbohydrate in sperm receptor activity was first evaluated by extensive deglycosylation of zona pellucida glycoproteins with trifluoromethanesulfonic acid (tfms). this reagent breaks glucosidic bonds between adjacent monosaccharides, as well as 0-glycosidic linkages between carbohydrate and amino acids (serine and threonine), but does not cleave asparaginyl:n-acetyl-dglucosaminyl amide linkages (edge et al., 1981) . consequently, glycoproteins treated with tfms are virtually denuded of carbohydrate, with only asparagine-linked n-acetylglucosamine remaining associated with the polypeptide chain. the sperm receptor activity of egg zonae pellucidae, as measured by an in vitro competition assay (see experimental procedures), is extremely sensitive to tfms. solubilized zonae pellucidae (~/pi) exposed only to tfmsbuffers inhibited sperm binding by more than 60% (11.5 + 4.6 sperm bound/egg) as compared with controls (i.e., no zonae pellucidae; 29.7 f 5.7 sperm bound/egg), a value similar to that observed with untreated zonae pellucidae (8.7 & 4.2 sperm bound/egg). zonae pellucidae treated with tfms inhibited sperm binding by less than 10% (26.9 f 5.1 sperm bound/egg). electrophoretic analyses of tfms-treated zonae pellucidae confirmed that the mature form of zp3 (83 kd) had been converted to a species with a molecular weight approximating that of the polypeptide chain (44 kd; salzmann et al., 1983) . these data indicate that removal of both n-and o-linked oligosaccharides from zp3 results in elimination of its sperm receptor activity. effect of endo+n-acetyl-d-glucosaminidase f on sperm receptor activity in view of the tfms results described above, endo+nacetyl-d-glucosaminidase f (endo f) was used to determine whether or not removal of only n-linked oligosaccharides from zp3 affected its sperm receptor activity. endo f cleaves glycosidic bonds in the diacetylchitabiosyl core region of both high-mannose and complex type n-linked oligosaccharides, but does not alter o-linked carbohydrates of glycoproteins (elder and alexander, 1982) . previously, we reported that there are two forms of mature zp3; one form possessing 3, and the other possessing 4, n-linked oligosaccharides per polypeptide chain (salzmann et al., 1983) . in the experiments that follow, the behavior of endo f-treated zp3 was compared with another zona pellucida glycoprotein, zp2 (6 n-linked oligosaccharides per polypeptide chain; greve et al., 1982) , and with total egg zonae pellucidae (zpl, zp2, and zp3). both purified zp2 and zp3 are substrates for endo f. following extensive digestion, both zp2 and zp3 migrated as broad bands on sds-polyacrylamide gels, with their apparent molecular weights decreased by about 40 kd and 30 kd, respectively, as the result of endo f treatment ( figure 1 ). as expected, the apparent molecular weights of endo f-treated zp2 and zp3 were higher than those of their polypeptide chains (81 kd and 44 kd, respectively; greve et al., 1982; salzmann et al., 1983) zonae pellucidae, when tested by the in vitro competition assay; similarly, endo f-treated egg zonae pellucidae retained full receptor activity (table 1) . it was noted that neither endo f treated zp3 nor endo f treated zonae pellucidae affected the fraction of motile sperm present in these experiments. similarly, based simply on microscopic examination, the speed and patterns of movement of the sperm used in these experiments were apparently unaffected. furthermore, bsa, hcg, zp2, and embryo zonae pellucidae, all of which lack any sperm receptor activity, continued to lack activity following treatment with endo f (data not shown). finally, sequential digestion of egg zonae pellucidae with endo f and pronase had no effect on sperm receptor activity, minimizing the possibility that a peptide domain possessing sperm receptor activity is rendered resistant to proteolysis by the presence of n-linked oligosaccharides (olden et al., 1982) . these results strongly suggest that n-linked oligosaccharides are not involved in the receptor activity of zp3. effect of alkali on zona pellucida glycoproteins zp2 and zp3 the 0-glycosidic bond between n-acetyl-d-galactosamine 10 + 1 (31 5 3) 10 r 2 (31 + 5) * sperm receptor activity was assayed as described in experimental procedures. egg zonae pellucidae and egg zp3 were present at a concentration of 2 zonae pellucidael~l. in these experiments, binding of sperm incubated with msecm (control) and with msecm containing 2 zonae pellucidaelrl was 30 + 6 and 10 + 2 sperm bound/egg, respectively. t samples (all in distilled water) were lyophilized, were resuspended in endo f buffers, were either heat inactivated (loo'c, 1 min) or had active endo f added, were incubated 4 hr at 37%, and were boiled 1 min as described in experimental procedures. samples were dialyzed, first against 7 m urea and then against distilled water, were lyophilized, were resuspended in msecm, and were then assayed for sperm receptor activity. *these data represent the mean r sd of 2 to 6 individual experiments and the /&hydroxyamino acids, serine and threonine, is relatively sensitive to alkaline cleavage by a o-elimination type reaction (sharon, 1975) . in view of the results obtained with tfms and endo f-treated zp3 (described above), we examined whether or not removal of o-linked oligosaccharides by mild alkaline hydrolysis affected the sperm receptor activity of zp3. to find conditions under which alkali released carbohydrate from zona pellucida glycoproteins but did not break peptide bonds, egg zonae pellucidae were lyophilized, were resuspended in various concentrations of naoh, and were incubated at 37% for 16 hr. following alkaline hydrolysis, zona pellucida solutions were neutralized, were radioiodinated, and were subjected to sds-page, as described in experimental procedures. we found that concentrations of naoh typically used in p-elimination reactions (50-100 mm) resulted in extensive degradation of the polypeptide chains of zp2 and zp3 under the conditions described here (data not shown). however, based on sds-page, 5 mm naoh had the desired effect on zp2 and zp3, reducing their molecular weights by less than 10 kd (figure 2 ), whereas 0.5 mm and lower concentrations of naoh did not have a detectable effect on the molecular weights of zp2 and zp3. when zonae pellucidae were radiolabeled prior to treatment with 5 mm naoh, 75-90% of the radiolabel initially associated with zp2 and zp3 was recovered in the lower molecular weight forms of the glycoproteins; the loss of radiolabel probably reflects alkali catalyzed release of '251, although the possibility of a low level of peptide hydrolysis has not been completely eliminated. finally, it was noted that zpl largely disappeared following exposure to 5 mm naoh ( figure 2 ). we attribute this apparent loss to alkali-catalyzed reduction of the intermolecular disulfide bond of zpl (putnam, 1954) resulting in formation of a species that comigrates with egg zonae pellucidae were solubilized, were lyophilized, were resuspended in either 5 mm naoh or distilled water, and were incubated for 16 hr at 37oc as described in experimental procedures. radiolabeled samples were run on sds-page, under nonreducing conditrons, and were autoradiographed. shown are autoradiographs of untreated zonae pellucidae (lane a), zonae pellucidae incubated in the presence of distilled water (lane s), and zonae pellucidae incubated in the presence of 5 mm naoh (lane c). positions of the origin of the gel (o), zpl, zp2, and zp3 are indicated. sperm receptor activity was measured by using the in vitro competition binding assay described in experimental procedures. zonae pellucidae incubated in the presence of either distilled water or 5 mm naoh were lyophilrzed, were resuspended rn culture medium, and were incubated with sperm, rn the range of 1-6 zonae pellucidaelpl, as described in experrmental procedures. the ability of untreated and naoh-treated (m) zonae pellucidae to competitively inhibit binding of sperm to eggs is shown. these data represent the mean 2 standard deviation of triplicate experiments in which the control level of sperm binding was 33.1 + 5.4 sperm bound/egg (i.e., at 0 zonae pellucidaelul). zp2 (bleil and wassarman, 198oc; wassarman et al., 1984a) . following p-elimination in the presence of a strong reducing agent such as nabh,, the elimination products of the glycosidically linked amino acids, serine and threonine, are alanine and a-aminobutyric acid, respectively (sharon, 1975) . in view of the molecular weight shifts of zp2 and zp3 following 5 mm naoh treatment, we determined whether or not serine and threonine were converted into alanine and a-aminobutyric acid under these conditions. purified egg zp2 and zp3 were incubated in the presence of 5 mm naoh and 1 m 3h-nabh, at 37% for 16 hr (mild alkaline reduction), were acid hydrolyzed, and were subjected to two-dimensional thin layer chromatography as described in experimental procedures. incorporation of 3h into alanine and a-aminobutyric acid was compared using zona pellucida glycoproteins exposed to 3h-nabh, in both the presence and absence of alkali. as a control glycoprotein, hcg was treated and was analyzed under identical experimental conditions. results of the analyses described above were expressed as incorporation ratios (cpm(n,oh)/cpm(h,oj) for alanine, for a-aminobutyric acid, and for other amino acids. in the case of hcg, incorporation ratios of 2.19 and 0.93 were determined for alanine and a-aminobutyric acid, respectively, indicating the presence of serine-, but not threonine-linked oligosaccharides. this result is in agreement with reports that hcg is 0-glycosylated only at serine residues (kessler et al., 1979b) . the average incorporation ratio for all other amino acids examined was 1.02 it 0.23, indicating that no other amino acids in hcg were radiolabeled in an alkali-specific manner. incorporation ratios for purified zp2 and zp3 are presented in table 2 . for both zp2 and zp3, ratios significantly greater than 1.0 were determined for alanine and for a-aminobutyric acid, but not for other amino acids. these results strongly suggest that both of these glycoproteins possess serine-and threonine-linked oligosaccharides that are released on mild alkaline hydrolysis. experimental evidence concerning the nature of the carbohydrate released is described below. results of experiments presented above indicated that, by using mild alkali, o-linked oligosaccharides could be removed from zp3 without causing extensive degradation of its polypeptide chain. accordingly, egg zonae pellucidae were exposed to 5 mm naoh at 37% for 16 hr and tested for sperm receptor activity in the in vitro competition assay. in these experiments, l-2% of the zonae pellucidae were radiolabeled with y-bolton-hunter reagent. this permitted determination of the percentage of treated material recovered and, therefore, estimation of zona pellucida concentrations used in sperm receptor assays. we found that egg zonae pellucidae retained virtually full sperm receptor activity, as compared with untreated material, following incubation at 37% for 16 hr in the presence of distilled water ( figure 2) . furthermore, addition of 5 mm naoh, which had been neutralized and lyophilized, to culture medium had no effect on binding of sperm to either amino acids or their derivatives. the ranges of values for zp2 and zp3 were 0.6-i .6 and 0.9-l .3, respectively. the range of values for background (i.e., ninhydrin-negative regions of the chromatograms) was 0.7-l .4. eggs in the competition assay. on the other hand, egg zonae pellucidae subjected to mild alkaline hydrolysis, under the conditions described above, lost approximately 90% of the sperm receptor activity present in untreated samples and in samples treated only with distilled water (figure 2) . similarly, mild alkaline hydrolysis of endo f-treated zonae pellucidae resulted in the loss of about 90% of sperm receptor activity in such samples (data not shown). in no case did the addition of untreated or treated zonae pellucidae affect sperm motility. the results of these experiments strongly suggest that removal of o-linked oligosaccharides from zp3 causes the loss of sperm receptor activity. of o-linked oligosaccharides possessing sperm receptor activity as a result of alkaline hydrolysis of serine/threonine: n-acetyl-d-galactosaminyl bonds in glycoproteins, liberated oligosaccharides undergo degradation stepwise from their reducing termini. this so-called "peeling reaction" can be minimized by including a strong reducing agent, such as nabh,, during the hydrolysis (lloyd, 1976) . under these conditions, the reducing termini of released oligosaccharides are rapidly converted to alkali-stable sugar alcohols. in the presence of 3h-nabh,, the released oligosaccharides are recovered as 3h-labeled alcohols. we used such a procedure to obtain radiolabeled oligosaccharides that possess sperm receptor activity from purified zp3. initial experiments demonstrated that radiolabeled (3h-nabh,) oligosaccharides, possessing sperm receptor activity, could be recovered from alkaline hydrolysates of egg zonae pellucidae. in these experiments, peptides, na+, and borates were removed, samples were subjected to gel filtration, and fractions were assayed for sperm receptor activity in the in vitro competition assay as described in experimental procedures. following gel filtration of zona pellucida hydrolysates on bio-gel p-2, sperm receptor activity was found associated with the pooled void volume, but not with the included volume material. aliquots of void volume material decreased binding of sperm to eggs by ot fraction number released from zp2 egg zp2 was purified and subjected to mild alkalme reduction in the presence of 3h-nabh, as described rn experimental procedures. olrgosaccharides, separated from peptides, na', and borates (see experimental procedures), were first chromatographed on bio-gel p-2 (0.7 x 17 cm), vord volume fractions were pooled, were lyophilized, were resuspended, and were then chromatographed on bra-gel p-6 (1.5 x 70 cm). bio-gel p-6 columns were developed with distilled water at 55% and at a flow rate of 20 ml/hr. an elution profile for radiolabeled oligosaccharides was determined by counting 10 pi aliquots of each column fraction of 1.5 ml (+a). in addition, two 700 pi aliquots of each column fraction were lyophilized. were resuspended in culture medium, and each was assayed for sperm receptor activity by using the in vitro competition binding assay ( w-0). the control level of sperm binding was 30.5 f 8.9 sperm bound/egg in the experiment shown. ferritrn and 3h-borohydride eluted in regions i (void volume) and v, respectively. 75% relative to control samples (8.3 f 4.7 versus 33.5 -+ 5.6 sperm bound/egg, respectively). the latter samples included untreated sperm, as well as sperm exposed to aliquots of pooled void volume material obtained following gel filtration of alkaline-borohydride hydrolysates not containing zonae pellucidae (i.e., "sham hydrolysates"). it was noted in these experiments that, while void volume fractions had no effect on sperm motility, material comigrating with 3h-mannose on &o-gel p-2 completely inhibited sperm motility; this inhibitory effect was even seen with sham hydrolysates, suggesting that it can be attributed to contaminants present in 3h-nabh,. in view of the results just described, purified zp2 and zp3 were subjected to mild alkaline hydrolysis in the presence of 3h-nabh,, followed by gel filtration. void volume fractions recovered from bio-gel p-2 columns were then pooled (as above) and were applied to &o-gel p-6. the p-6 elution profiles for zp2 and zp3 are shown in figure 3 and in figure 4 , respectively, together with the results of sperm receptor activity measurements for each fraction eluted. receptor activity was determined on an aliquot of each fraction that had been lyophilized and had then been resuspended in culture medium, as described in experimental procedures. while the elution profile for zp2 ( figure 3 ) was similar to that for zp3 (figure 4) , as exthese experiments were carried out with purified zp3, radiolabeled with 3h-nabh,, exactly as described in the legend to figure 3 . shown are the bio-gel p-6 elution profile for radiolabeled oligosaccharides (m) and the sperm receptor activity profile (-0). the region of the elution profile displaying sperm receptor activity (region ii) is stippled. the control level of sperm binding was 53.9 f 7.8 sperm bound/egg in the experiment shown. ferritin and 3h-borohydride eluted in regrons i (void volume) and v, respectively. petted, only fractions from zp3 possessed sperm receptor activity, and the activity was associated with a single region of the elution profile (region ii) (figure 4 ). in two independent experiments, oligosaccharides eluted in region ii (~3.4 kd-4.6 kd apparent molecular weight) accounted for about 35% of the sperm receptor activity associated initially with intact zp3. in a control experiment, hcg was subjected to alkalineborohydride hydrolysis and to gel filtration on bio-gel p-2 and p-6 under the same conditions used for purified zp3; fractions eluted from p-6 had no effect on binding of sperm to eggs in vitro. furthermore, when egg zonae pellucidae were treated with borohydride in the absence of alkali, no sperm receptor activity was found in fractions eluted from p-6 columns (data not shown). under these conditions, o-linked carbohydrate remained associated with peptide and was removed during cation exchange chromatography (see experimental procedures). these results strongly suggest that release of sperm receptor activity from zp3 by mild alkaline reduction did not result from borohydride side reactions; these can include release of n-linked oligosaccharides (rasilo and renkonen, 1981; ogata and lloyd, 1982) and cleavage of peptide bonds (crestfield et al., 1963; shimamura et al., 1984) . binding of o-linked oligosaccharides to sperm the results described above strongly suggest that a specific size class of o-linked oligosaccharides, derived from zp3 and possessing sperm receptor activity, can be fractionated on bio-gel p-6. to determine whether or not these oligosaccharides bind to sperm, the experiments that follow were carried out using purified zp2 and zp3. these experiments involved incubation of sperm with radiolabeled oligosaccharides released from either zp2 or zp3, centrifugation of the sperm through dibutyl phthalate into sucrose-triton x-100, and gel filtration, first on bio-gel p-2 and then on p-6, as described in experimental procedures. the bio-gel p-6 elution profiles for 3h-oligosaccharides released from zp2 and zp3 are shown in figure 5 and in figure 6 , respectively. in each case, profiles are presented for both total oligosaccharides and for oligosaccharides associated with sperm following a brief incubation. the results obtained with zp2 oligosaccharides were virtually identical with those presented in figure 3 ; no particular size class of zp2 oligosaccharides was selectively bound to sperm. on the other hand, while the profile of total zp3 oligosaccharides closely resembled that presented in figure 4 , the profile of sperm-associated oligosaccharides differed. the latter material was significantly enriched in the region of the elution profile that had been shown to possess sperm receptor activity (designated as region ii in figure 4 and region iv in figure 6 ). this enrichment of region iv was observed in three independent experiments. although other size classes of zp3 oligosaccharides were associated with sperm ( figure 6 ) the extent of their association simply reflected their relative abundance in the total population (i.e., no enrichment), as was the case with zp2 oligosaccharides ( figure 5 ). furthermore, selective binding of zp3 oligosaccharides (region iv) appeared to be specific for sperm, since analogous experiments using mouse adipocytes did not demonstrate any selective binding of oligosaccharides to these cells (data not shown). the implication that sperm-associated oligosaccharides found in region iv of bio-gel p-6 profiles should possess sperm receptor activity was tested directly. sperm were incubated with radiolabeled zp3 oligosaccharides, and bound oligosaccharides were then eluted from the sperm, were fractionated on bio-gel p-6, were pooled as indicated in figure 6 , and were tested for sperm receptor activity. when sperm were exposed to oligosaccharides found in region iv (~3.4 kd-4.5 kd apparent molecular weight), and were then incubated with unfertilized eggs, a 50% inhibition of sperm binding was observed. other regions of the elution profile were without effect on sperm binding, even though they, too, had been adsorbed to sperm (see legend to figure 6) . similarly, all regions of the elution profile of sperm-associated zp2 oligosaccharides were tested for sperm receptor activity and were found to be completely inactive (see legend to figure 5 ). bound to sperm results presented above (table 2) strongly suggest that zp3 possesses o-linked oligosaccharides that are released on mild alkaline hydrolysis. since a specific size class of zp3 oligosaccharides was found associated with sperm ( figure 6 ) and this material possessed sperm o-linked oligosaccharrdes were released from purified zp2, were radiolabeled with 3h-nabh,, and were chromatographed on bio-gel p-2 and p-6 as described in experimental procedures and the legend to figure 3 . prior to gel filtration, one portion of radiolabeled material was incubated with sperm for 1 hr at 37%, the sperm were centrifuged through dibutyl phthalate into 0.5 m sucrose /i% triton x-100, and the detergent phase was collected and was centrifuged to remove insoluble material (see experimental procedures). shown are the bio-gel p-6 elution profiles for radiolabeled oligosaccharides not incubated with sperm (m ) and associated with sperm after a 1 hr incubation (w-0). in the case of radiolabeled oligosaccharides associated with sperm, fractions were pooled (regions i-viii), were lyophilized, were resuspended in culture medium (30 pi), and were assayed for sperm receptor activity in the in vitro competition binding assay. none of the fractions examined exhibited sperm receptor activity. the control level of sperm binding was 32.2 f 7.6 sperm bound/egg in the experiment shown. the values for regions i-viii, expressed as a percent of the control, were 100 f 8, 101 ? 6, 94 * 5, 98 2 4, 102 f 2, 105 ? 2, 99 _+ 1, and 99 + 1, respectively. ferritin and jhborohydride eluted in regions ii (void volume) and viii, respectively. receptor activity, we determined directly the linkage class of these oligosaccharides. egg zp3 was subjected to mild alkaline hydrolysis in the presence of 3h-nabh, as before. under these conditions, glycosidically linked sugars are converted to their respective sugar alcohols; 3h-n-acetyl-d-glucosaminitol and jh-n-acetyl-d-galactosaminitol were converted from n-and o-linked oligosaccharides, respectively. oligosaccharides were separated from peptides, from na+, and from borates, were incubated in the presence or absence of sperm, were hydrolyzed, were re-nacetylated, and were analyzed by descending paper chromatography as described in experimental procedures. in the case of oligosaccharides not adsorbed to sperm, the majority of the tritium incorporated into hexosamines was found in n-acetyl-d-gafactosaminitof (figure 7) . a small amount of radiolabel was observed comigrating with n-acetyl-d-glucosaminitol; however, this constituted 10% these experiments were carried out with purified zp3 exactly as described in the legend to figure 5 . shown are the &o-gel p-6 elution profile for radiolabeled oligosaccharides not incubated with sperm (m) and associated with sperm after a 1 hr incubation the region of the elution profile that was significantly enriched following incubation of oligosaccharides with sperm (region iv) is stippled. in the case of radiolabeled oligosaccharides associated with sperm, fractions were pooled (regions i-viii), were lyophilized, were resuspended in culture medium (30 pi), and were assayed for sperm receptor activity in the in vitro competition binding assay. only region iv exhibited sperm receptor activity, reducing sperm binding by more than 50%, as compared with controls. the control level of sperm binding was 29.1 f 6.6 sperm bound/egg in the experiment shown. the values for regions i-viii, expressed as a percent of the control, were 93 * 3, 93 f 4, 94 f 1, 49 f 2, 95 2 2, 97 + 2, 100 + 1, and 96 -t 5, respectively. ferritin and 3h-borohydride eluted in regions ii (void volume) and viii, respectively. or less of the radiolabel present in n-acetyl-o-galactosaminitol (figure 7) . when identical analyses were performed on oligosaccharides that had been bound to sperm, once again, the vast majority of radiolabel (>95%) was found in n-acetyl-o-galactosaminitol, rather than in n-acetyl-d-glucosaminitol (figure 7) . these results are consistent with those presented in table 2 and provide additional support for our conclusion that o-linked oligosaccharides of zp3 are involved in sperm receptor activity. we have found that mammalian sperm-egg interaction provides an attractive system within which to define the role of oligosaccharides in cellular adhesion. while a role for carbohydrates has been implicated in several other biological systems (gulp, 1978; frazier and glaser, 1979; barondes, 1981; ashwell and harford, 1982) , it has often been difficult to distinguish between a direct effect of carbohydrates on cellular adhesion and an indirect, modulatory influence (hoffmann and edelman, 1983) . in this connection, we recently reported that small glycopeptides derived from mouse egg zp3 possess virtually all of the released from zp3 purified egg zp3 was subjected to mild alkaline hydrolysis in the presence of 1 m $h-nabh,, oligosaccharides were isolated and were analyzed either directly or following adsorption by sperm (see experimental procedures). radiolabeled oligosaccharides were lyophillzed, were resuspended in 200 ~1 of 4 n hci, and were hydrolyzed for 4 hr at 100%. hexosamines were isolated by ion-exchange chromatography, were re-n-acetylated, and were resolved as sugar alcohols by descending paper chromatography, as described in experimental procedures. shown are the profiles of radioactivity for purified zp3 not incubated with sperm (m ) and associated with sperm after a 1 hr incubation (w 0). the positions of n-acetycd-galactosaminitol (i) and n-acetyl-d-glucosaminitol (ii) are indicated. sperm receptor activity of the intact glycoprotein, suggesting that sperm receptor function is carbohydrate-mediated (florman et al., 1984) . here, we have extended these observations by demonstrating directly that a specific size class of o-linked oligosaccharides derived from zp3 binds to sperm and possesses receptor activity. carbohydrates conjugated to the /3-hydroxyl groups of serine and threonine (o-linked) were first reported for bacterial enzymes (hanafusa et al., 1955) and were later shown to be constituents of mammalian mucins and proteoglycans (anderson et al., 1964; bhavanandan et al., 1964; harbon et al., 1964; tanaka et al., 1964) . to date, these glycoconjugates have been found on a large variety of both membrane-associated and secretory proteins (sharon, 1975; marshall, 1979; kornfeld and kornfeld, 1980) . o-linked oligosaccharides are structurally diverse, consisting of linear to highly branched chains of 2 to 20 sugars that are added after translation to nascent polypeptide chains by stepwise transfer of monosaccharides (kornfeld and kornfeld, 1980; hanover and lennarz, 1981; schachter and williams, 1982) . studies of the functional significance of o-linked oligosaccharides have been severely hindered by the lack of any specific metabolic inhibitors of this type of glycosylation. however, evidence that o-linked carbohydrates may be involved in antifreeze glycoprotein function (vanderheede et al., 1972) , in protection of mucins against proteolysis (allen, 1983) , in platelet adhesion (judson et al., 1982; tsuji et al., 1983) , and in hemagglutination by vaccinia virus (shida and dales, 1981) has been presented several observations suggest that o-linked oligosaccharides are present on the mouse egg's sperm receptor, zp3. first, zp3 exhibits both heterogeneity with respect to isoelectric point, and it exhibits a molecular weight higher than that of its polypeptide chain, following either removal of n-linked oligosaccharides with endo f or inhibition of n-linked glycosylation with tunicamycin (roller and wassarman, 1983; salzmann et al., 1983) . second, mild alkaline hydrolysis of zp3, in the absence of nabh,, results in a slight decrease of its molecular weight (figure 2 ). in the presence of nabh,, serine and threonine residues are converted to alanine and to a-aminobutyric acid, respectively (table 2) , and release of n-acetyl-d-galactosaminitol is observed (figure 7) . third, treatment of zp3 with either tfms or with mild alkali (figure 2) but not with endo f (table 1 ) or with pronase (florman et al., 1984) , results in a loss of its sperm receptor activity. finally, following mild alkaline hydrolysis of zp3, in the presence of nabh,, sperm receptor activity is found associated with released oligosaccharides having n-acetyl-o-galactosaminitol at the reducing terminus ( figure 4, figure 6 , and figure 7) ; strong evidence for the presence of o-linked oligosaccharides, since n-linked oligosaccharides do not contain n-acetyl-p-galactosamine (sharon, 1975; marshall, 1979) . therefore, zp3 apparently resembles secretory proteins such as hcg (kessler et al., 1979a (kessler et al., , 1979b and fetuin (spiro and bhoyroo, 1974 ) membrane proteins such as glycophorin (thomas and winzler, 1969; marchesi et al., 1976 ), ldl receptor (cummings et al., 1983 and several viral coat proteins (niemann et al., 1982; johnson and spear, 1983) in that it possesses both n(roller and wassarman, 1983; salzmann et al., 1983) and o-linked oligosaccharides. a discrete size class of o-linked oligosaccharides derived from zp3, but not from zp2, inhibits binding of sperm to eggs in vitro (figure 3 and figure 4 ). this appears to be a direct effect on the adhesion process, since small glycopeptides derived from zp3 do not trigger the acrosome reaction in vitro (florman et al., 1984) . moreover, the same size class of zp3 o-linked oligosaccharides (-3.9 kd apparent molecular weight) that inhibits binding of sperm to eggs, also binds preferentially to sperm ( figure 6 ); no particular size class of zp2 oligosaccharides displays such a preference ( figure 5) . these, as well as other observations presented here (figure 2 and table 1 ) and elsewhere (bleil and wassarman, 1980a; florman et al., 1984; wassarman et al., 1984b wassarman et al., , 1985 , strongly suggest that mouse sperm recognize and bind to eggs via o-linked oligosaccharides present on zp3. in particular, such a situation explains the unusual stability of the sperm receptor activity of zp3 following exposure of the glycoprotein to extremes of temperature, denaturants, or detergents (bleil and wassarman, 1980a; florman and wassarman, 1983 ); a property characteristic of a number of other putative receptors (bleil and wassarman, 1980a) . it has been suggested previously that carbohydrate plays a role in binding sperm to zonae pellucidae, since various lectins, monosaccharides, and glycoconjugates inhibit the binding of sperm to mammalian eggs (ahuja, 1982; oikawa et al., 1973; huang et al., 1982; shur and hall, 1982; wassarman et al., 1984b) . similarly, it has been reported that various monosaccharides and polysaccharides inhibit sperm-egg interaction in several in-vertebrate and plant species (bolwell et al., 1979 (bolwell et al., , 1980 rosati and de santis, 1980; glabe et al., 1982; barnum and brown, 1983) . in sea urchins, such observations are particularly relevant, since several lines of evidence suggest that gamete adhesion is mediated by "bindin", a lectin-like sperm protein associated with acrosomes, and by a carbohydrate containing sperm receptor in the egg's vitelline envelope (vacquier and moy, 1977; glabe and vacquier, 1978; glabe, 1979; glabe and lennarz, 1979, 1981; glabe et al., 1982; rossignol and lennarz, 1983; rossignol et al., 1984) . in the case of one species of sea urchin, strongylocentrotus purpuratus, galactosamine has been detected in glycoconjugates, which are derived from vitelline envelopes (rossignol et al., 1984) having sperm receptor activity. this suggests possible structural analogies with the mouse egg's sperm receptor. in the mouse, it remains to be determined whether or not o-linked oligosaccharides on zp3 are recognized by a lectin-like protein, analogous to bindin, on sperm. should such a protein be present, it would have to be located on plasma membrane overlying the sperm head, rather than on the acrosomal membrane, since only sperm that have not undergone the acrosome reaction bind to mouse eggs (saling and storey, 1979; florman and storey, 1982; bleil and wassarman, 1983) . the mouse egg's sperm receptor plays a multifaceted role in the regulation of fertilization. in addition to mediating binding of sperm to eggs, zp3 induces bound sperm to undergo the acrosome reaction (bleil and wassarman, 1983; florman et al., 1984; wassarman et al., 1985) and participates in the secondary block to polyspermy (wolf, 1981; schmell et al., 1983; wassarman et al., 1984b) . the latter apparently involves modification of zp3 following fertilization, such that it no longer possesses sperm receptor activity wassarman, 1980a, 1983) . in sea urchins, it has been suggested that proteases, originating from the egg's cortical granules, release sperm receptors from the vitelline envelope following fertilization (vacquier et al., 1972 (vacquier et al., , 1973 glabe and vacquier, 1978) . since zp3 purified from 2-cell embryo zonae pellucidae, as well as glycopeptides derived from embryo zp3, do not possess receptor activity, it would appear that zp3 throughout the zona pellucida is modified following fertilization. although zp2 does undergo limited proteolysis following either fertilization or parthenogenetic activation (bleil et al., 1981) there is no evidence as yet for proteolysis of zp3 (bleil and wassarman, 1980a; bleil et al., 1981) . whatever the nature of the modification, it is subtle, not rendering embryo zp3 distinguishable from egg zp3 by conventional electrophoretic analysis (bleil and wassarman, 1980a; f? wassarman, unpublished results) . we suggest that modification of the o-linked oligosaccharides described here, by a specific cortical granule glycosidase, could account for inactivation of zp3 following either fertilization or parthenogenetic activation. detailed characterization of o-linked oligosaccharides derived from both egg and embryo zp3, as well as characterization of cortical granule glycosidases, will be necessary to resolve this issue. finally, it should be noted that mammalian sperm receptors exhibit a certain degree of species specificity [bedford, 1981; yanagimachi, 1984; wassarman et al., 1984b) . fertilization of zona pellucidafree eggs by heterologous sperm in vitro is quite common, whereas hybrid fertilization of zona pellucida-intact eggs is rare (adams, 1974; yanagimachi, 1981 yanagimachi, , 1984 barros and leal, 1980; gulyas and schmell, 1981) . the question of whether or not the o-linked oligosaccharides derived from zp3 inhibit sperm binding in a species-specific manner has not been addressed in this investigation. in this connection, it has been demonstrated that, although sperm receptor activity is associated with glycopeptides derived from sea urchin vitelline envelopes, the glycopeptides do not exhibit the species specificity observed with high molecular weight, vitelline envelope glycoconjugates (kinsey and lennarz, 1981; rossignol et al., 1983; glabe and lennarz, 1981) . it will certainly be of interest to compare the structure of the zp3 o-linked oligosaccharides described here with functionally analogous oligosaccharides from other mammalian species. collection and culture of mouse gametes gamete incubations were routinely carried out under 011, in a mouse gamete culture medium supplemented with 0.4% polyvinylpyrrolidone-40 (msecm), at 37% in an environment of 5% co, in air. these conditions are capable of supporting mouse sperm capacitation and fertilization in vitro (florman et al., 1984) . mature (>4 weeks old), female, swiss albino mice (cd-i; charles river breeding labs) were injected with 10 iu of pregnant mare's serum gonadotropin (pmsg; sigma), followed in 48 hr by io iu of human chorionic gonadotropin (hcg; sigma). ovulated eggs, recovered from oviducts 13-16 hr after hcg, were freed of surrounding cumulus cells with hyaluronidase (0.1% in msecm; type v ovine testicular hyaluronidase; sigma). embryos at the 2-cell stage were flushed from oviducts 37-40 hr after hcg injection. the caudae epididymes of mature (retired breeders), male, swiss albino mice were punctured with sterile, steel needles, releasing sperm into msecm. after 10 min, epididymal tissue was removed, sperm concentrations were adjusted to 4 x 10b/ml, and sperm were preincubated for 30 min. sperm motility was assessed with an inverted phase microscope; preparations with less than 70% of the sperm motile at the end of the preincubation period were discarded. preparation of zonae pellucidae zonae pellucidae were removed from eggs and embryos with micropipettes ("60 pm internal diameter), were washed by transfer through pbs (ph 7.5) supplemented with 0.4% polyvinylpyrrolidone-40 (pbs-pvp), and were solubilized in l-2 pi of 5 mm nah,po, (ph 2.5). when required, zonae pellucidae were radiolabeled with 1251-bolton-hunter reagent (~4000 cilmmole, new england nuclear), as previously described (greve et al., 1982) . after sds-page, zona pelluclda glycoproteins were obtained by electroelution from gel slices, followed by electrodialysis, dialysis against 7 m urea and then against distilled water, and lyophilization (bleil and wassarman, 1980a) . of zona pellucida glycoproteins was carried out in the presence of trifluoromethanesulfonic acid (tfms; sigma), as described by edge et al. (1981) . in control experiments, distilled water was substituted for tfms. protein was recovered by extraction with diethyl ether and 50% (v/v) aqueous pyridine. the aqueous phase was dialyzed against distilled water, aliquots were taken for radiolabeling with y-bolton-hunter reagent, followed by sds-page analysis, and determination of sperm receptor activity (see below). under these conditions the electrophoretic mobility of bsa was unaffected. selective deglycosylation was achieved by two different procedures. to remove n-linked oligosaccharides, glycoproteins were treated with endo+n-acetyl-o-glucosaminidase f (endo f; elder and alexander, 1982) . lyophilized samples were resuspended in 25 pi of endo f buffer (100 mm nah,po, (ph 6.1), 50 mm edta, 1% np-40, 0.1% sds, and 1% p-mercaptoethanol), were boiled for 2 min, and were cooled to 37'c. digestions were initiated by the addition of 1 ~1 endo f (provided in a 50% glycerol/ 25 mm edta solution by dr. j. h. elder) and were terminated by boiling for 1 min. incubation under these conditions for 4 hr at 37oc was sufficient for complete digestion of zona pellucida glycoproteins (it was noted that under these conditions the electrophoretic mobility of bsa was unaffected, indicating the absence of protease contaminants). control incubations received either 1 pi of 50% glycerol/25 mm edta or 1 ~1 of heat-inactivated (100% for 1 min) endo f. samples were electrodialyzed, were dialyzed exhaustively against 7 m urea and then against distilled water, and aliquots were taken for electrophoretic analysis and for evaluation of sperm receptor activity. o-linked oligosaccharides were removed by alkaline pelimlnation. zona pellucida glycoproteins were lyophilized and were resuspended in 50 ~1 of 5 mm naoh. to prevent alkaline degradation of released oligosaccharides when reducing conditions were required (lloyd, 1976) , reactions were carried out in the presence of 1 m 'h-nabh, ("100 mci/mmole, new england nuclear). samples were incubated for 16 hr at 37% in a toluene atmosphere. in control samples, distilled water (ph 7.0) was substituted for naoh. reactions were terminated by cooling samples to 4% followed by acidification to ph 6.0 with 0.1 n acetic acid to eliminate excess 3h-nabh,. samples containing oligosaccharides were neutralized with 1 n naoh, were separated from peptides and from na* on 5 ml columns of dowex 50x4-400 (h' form; 200-400 mesh), were lyophilized, were resuspended in 1% acetic acid in methanol, and were dried under a stream of n,. methanol evaporations were repeated 4 times to remove excess borate as its volatile methyl ester derivative (zill et al., 1953) . conditions for the extensive proteolysis of zona pellucida glycoproteins with cmc-conjugated pronase (sigma) have previously been described (florman et al., 1984) . binding studies oligosaccharides derived from egg zonae pellucidae and radlolabeled during mild alkaline reduction, were lyophilized and were resuspended in msecm at a concentration of about 2.7 zonae pellucidae/$. one ~1 was removed for gel filtration (see below), and the remaining oligosaccharide solution was divided into 37.5 (11 aliquots that were each added to 12.5 ~1 drops of sperm (4 x 106/ml) under oil (2 zonae pellucidael~l, final concentration). sperm were incubated for 1 hr at 37% during which time motility was monitored as previously described (florman et al., 1984) . experiments were discarded when oligosaccharide treatment resulted in decreased cell motility relative to untreated control incubations. oligosaccharide binding was assessed by applying 40 ~1 of the sperm suspension to siliconized, 400 (11 eppendorf tubes containing step gradients of 200 pi dibutyl phthalate (sigma) on top of 20 pi 1% triton x-100 in 0.5 m sucrose (cuatrecasas and hollenberg, 1976 ). since dibutyl phthalate has a density (1.043 g/ml) intermediate between that of msecm and sperm, centrifugation (8500 x g, 30 set) of sperm yielded a pellet in the sucrose layer, whereas msecm did not penetrate the oil phase. aliquots were taken from the medium and from the sucrose layers for both determination of radioactivity and for gel filtration. zona pellucida oligosaccharides were also incubated with adipocytes isolated from mouse epididymal fat pads (rodbell, 1964) . after 1 hr incubation, 40 ~1 of adipocyte suspensions were layered on top of 200 ~1 of dioctyl phthalate (aldrich). centrifugation (8500 x g, 30 set) displaced incubation medium to the bottom of the tube, while the less dense adipocytes remained as a layer of packed cells above the oil phase (dubyak and kleinzeller, 1980) . of oligosaccharides to analyze zona pellucida oligosaccharides following sperm binding, 5 ~.rl samples of either the starting material (3h-oligosaccharides in msecm prior to the addition of sperm) or of the sucrose phase, following centrifugation (see above), were brought to 25 ~1 with distilled water. gel filtration of jh-oligosaccharides was carried out at 55â°c on bio-gel p-6 (200-400 mesh; 1.5 x 70 cm) that had been previously equilibrated in distilled water (heyraud and rinaudo, 1978; yarnashita et al., 1982) . columns were developed in distilled water, 1.5 ml fractions were collected, were lyophilized, were resuspended in msecm, and were tested for sperm receptor activity. recovery from these columns varied from 80 to 97%. linkage analysis amino acid analysis was carried out after mild alkaline reduction of zona pellucida glycoproteins. lyophilized samples were resuspended in 2 ~1 of a 5 mm l-alanine, 5 mm l-a-aminobutyric acid solution, and 200 ~1 of 6 n hci. in some experiments samples were dissolved in 200 pi hci and 2 ~1 of an amino acid mixture (5 mm of each of the biologically relevant amino acids, as well as of i.-a-aminobutyric acid). glycoproteins were hydrolyzed in vacua at 110â°c for 18 hr, were dried in a dessicator over naoh, were washed 3 times by methanol evaporation to remove borates, and with distilled water to remove hci. dried hydrolysates were resuspended in 10 ~1 of distilled water, insoluble material was removed by centrifugation (8500 x g, 1 min), and l-5 ~1 aliquots were applied to cellulose, thin-layer chromatography plates (20 x 20 cm; chromogram, eastman). chromatograms were developed in an unsaturated atmosphere of n-butanol:acetone:diethylamine:water (10:10:2:5, ph 12.0) in the first dimension and in an atmosphere of isopropanol:formic acid (99%):water (40:2:10, ph 2.5) in the second dimension (erenner and niederwieser, 1967) . amino acids were visualized with ninhydrin. were scraped into scintillation vials, were extracted with 0.5 ml distilled water (30 min, 6o"c), and associated 3hradioactivity was determined by liquid scintillation spectroscopy with 10 ml aquasol (new england nuclear). in some experiments, the entire chromatogram was divided into a 1 x 1 cm grid, and the distribution of tritium was evaluated. the identity of carbohydrates, glycosidically linked to zp3, was determined following mild alkaline reduction. analysis was performed both on total 3h-oligosaccharides from zp3 and on material bound by sperm (see above). oligosaccharides were hydrolyzed in vacua in 4 n hci for 4 hr at 100â°c, were deacidified by repeated washes with distilled water under a stream of n,, and were resuspended in 1 ml distilled water. amino sugars were eluted from 5 ml columns of dowex-50 (h' form: 200-400 mesh) with 2 n hci, were deacidified as previously described, were resuspended in 0.2 ml saturated nahco,, and re-nacetylated with acetic anhydride (takasaki and kobata, 1975) . the fraction eluted from coupled columns of dowex-50 (h' form; 200-400 mesh; 0.5 ml bed volume) and dowex-1 (oh-form; 200-400 mesh; 1 ml bed volume) with distilled water was then dried under n,, was resuspended in 10 pi 35% ethanol, and was applied to borate impreg nated, whatmann #l paper. chromatograms were developed, in the descending direction, with ethyl acetate:pyridine:water (2:1:2; cabib et al., 1953) were cut into 1 cm long strips, and the associated radioactivity was determined by liquid scintillation spectroscopy. standards were chromatographed in adjacent lanes and were visualized by staining with silver nitrate. gel electrophoresis sds-page was performed according to the method of laemmli (1970) by using a 10% acrylamide separating gel and a 4% acrylamide stacking gel. nonreducing conditions were used where indicated. determination of sperm receptor activity sperm receptor activity was determined in an in vitro competition binding assay (bleil and wassarman, 1980a; florman et al., 1984) . aliquots (10 pi) of preincubated sperm suspensions were added to 30 ~1 msecm containing substances to be tested for sperm receptor activity. after 60 min at 37x, ten unfertilized eggs and three 2-cell embryos were added in 1-2~1 msecm. thirty minutes later, eggs and embryos, with associated sperm, were removed with a wide-bore micropipette (internal diameter >i00 pm) and were pipetted until no more than l-2 sperm remained attached to embryo zonae pellucidae. sperm associate reversibly and nonspecifically with embryo zonae pellucidae, but establish both nonspecific, as well as tenacious, specific bonds to zonae pellucidae of eggs; thus, these conditions serve to remove nonspecifically-associated sperm from egg zonae pellucidae (bleil and wassarman, 1980a) . eggs and embryos were then transferred to microscope slides, were fixed with 3% glutaraldehyde in pbs-pvp and the number of bound sperm was determined wrth an inverted phase microscope. we are grateful to dr. john h. elder for a generous gift of endoglycosidase f and to the members of our laboratory for advice and constructive criticism throughout the course of this research. the research was supported in part by a grant from the national institute of child health and human development (hd-12275) awarded to p m. w. h. m. f. is a national institutes of health postdoctoral fellow. the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" rn accordance with 18 u.s.c. section 1734 solely to indicate this fact. recerved january 8, 1985; revised february 14, 1985 species specificity in fertilization fertilization studies in the hamster. the role of cellsurface carbohydrates mucus-a protective secretion of complexity threonine and serine linkages in mucopolysaccharides and glycoproteins carbohydrate specific receptors m the liver effects of lectins and sugars on primary sperm attachment in the horseshoe crab, limulus polyphemus l lectins: their multiple endogenous cellular functions in vitro fertilization and its use to study gamete interactions why mammalian gametes don't mix the complete degradation of glycopeptides containing 0-seryl and 0-threonyl linked carbohydrate mammalian sperm-egg interaction: identification of a glycoprotein in mouse egg zonae pellucidae possessing receptor activity for sperm synthesis of zona pellucida proteins by denuded and follicle-enclosed mouse oocytes during culture in vitro structure and function of the zona pellucida: identification and characterization of the proteins of the mouse oocyte's zona pellucida sperm-egg interachons in the mouse: sequence of events and induction of the acrosome reactron by a zona pellucida glycoprotein mammalian sperm-egg interaction: fertilization of mouse eggs triggers modification of the major zona pellucida glycoprotein fertilization in brown algae. ii. evidence for lectin-sensitive complementary receptors involved in gamete recognition in fucus serfatus fertilization in brown algae. ill. preliminary characterization of putative gamete receptors from eggs and sperm of fucos serratus the red cell membrane structures and functions of glycoproteins post-translational glycosylation of coronavirus glycoprotein el: inhibition by monensin mild alkaline borohydride treatment of glycoproteins-a method for liberating both n-and o-linked carbohydrate chains wheat germ agglutinin blocks mammalian fertilization funcbon of the carbohydrate moieties of glycoproteins protein denaturation mild alkaline borohydride treatment liberates n-acetylglucosamine-linked oligosaccharide chains of glycoproteins metabolism of isolated fat cells. i. effects of hormones on glucose metabolism and lipolysis role of asparagine-linked oligosaccharides in secretion of glycoproteins of the mouse egg's extracellular coat role of the surface carbohydrates brenner, m., and niederwieser, a. (1967) . thin layer chromatography of amino acids. meth. enzymol. 77, cabrb, e., leloir, l. f., and cardim, c. e. (1953) . uridine diphosphate acetylglucosamine. j. biol. chem. 203, 1055 -1070 crestfield, a. m., moore, s., and stein, w. h. (1963) . the preparation and enzymatic hydrolysis of reduced and s-carboxymethylated proteins. j. biol. chem. 238, 622-627. cuatrecasas, p., and hollenberg, m. d. (1976) . membrane receptors and hormone action. adv. prot. chem. 30, 251-451. culp, l. a. (1978) . biochemical determinants of cell adhesion. curr. topics membranes and transport 77, 327-396.cummings, ft. d., kornfeld, s., schneider, w. j., hobgood, k. k., tolleshaug, h., brown, m. s., and goldstein, j. l. (1983) . biosynthesis of n-and o-linked oligosaccharides of the low density lipoprotein receptor. j. biol. chem. 258, 15261-15273. dubyak, g. r., and kleinzeller, a. (1980) . the insulin-mimetic effects of vanadate in isolated rat adipocytes. dissociation from effects of vanadate as a (na'-k')atpase inhibitor. j. biol. chem. 255, 5306-5312.edge, a. s. b., faltynek, c. r., hof, l., reichert, l. e., and weber, p. (1981) schmell, e. d., gulyas, 8. j., and hedrick, j. l. (1983) . egg surface changes during fertilization and the molecular mechanism of the block to polyspermy.in chem. 244, 5943-5946. tsuji, t, tsunehisa, s., watanabe, y., yamamoto, k., tohyama, h., and osawa, t. (1983) . the carbohydrate moiety of human platelet glycocalitin. the structure of the major ser/thr-linked sugar chain. j. biol. chem. 258, 6335-6339. vacquier, v. d., and moy, g. w. (1977) . isolation of bindin: the protein responsible for adhesion of sperm to sea urchin eggs. proc. natl. acad. sci. usa 74, 2456 -2460 . vacquier, v. d., epel, d., and douglas, l. a. (1972 . sea urchin eggs release protease activity at fertilization. nature 237, 34-36. vacquier, v. d., tegner, m. j., and epel, d. (1973) key: cord-336696-c3rbmysh authors: oberfeld, blake; achanta, aditya; carpenter, kendall; chen, pamela; gilette, nicole m.; langat, pinky; said, jordan taylor; schiff, abigail e.; zhou, allen s.; barczak, amy k.; pillai, shiv title: snapshot: covid-19 date: 2020-04-30 journal: cell doi: 10.1016/j.cell.2020.04.013 sha: doc_id: 336696 cord_uid: c3rbmysh abstract coronavirus disease 2019 (covid-19) is a novel respiratory illness caused by sars-cov-2. viral entry is mediated through viral spike protein and host ace2 enzyme interaction. most cases are mild; severe disease often involves cytokine storm and organ failure. therapeutics including antivirals, immunomodulators, and vaccines are in development. to view this snapshot, open or download the pdf. anti-viral rdrp inhibitor* other antiviral* lopinavirritonavir rt-pcr, naat, crispr-based* snapshot: covid-19 blake oberfeld, 1 aditya achanta, 1 kendall carpenter, 1 pamela chen, 1 nicole m. gilette, 1 pinky langat, 1 jordan t. said, 1 abigail e. schiff, 1,2, allen s. zhou, 1 amy k. barczak, 1,2 and shiv pillai in december 2019, several cases of pneumonia of unknown origin were reported in wuhan, china. the causative agent was characterized as a novel coronavirus, initially referred to as 2019-ncov and renamed severe acute respiratory syndrome coronavirus-2 (sars-cov-2) (zhou et al., 2020b) . this respiratory illness, coronavirus disease 2019 (covid-19), has spread rapidly by human-to-human transmission, caused major outbreaks worldwide, and resulted in considerable morbidity and mortality. on march 11, 2020, who classified covid-19 as a pandemic. it has stressed health systems and the global economy, as governments balance prevention, clinical care, and socioeconomic challenges. classified in the coronaviridae family and betacoronavirus genus, sars-cov-2 is the seventh coronavirus known to infect humans. coronaviruses are enveloped positivesense, single-stranded rna viruses with mammalian and avian hosts. human coronaviruses include 229e, nl63, oc43, and hku1, which are associated with mild seasonal illness, as well as viruses responsible for past outbreaks of severe acute respiratory syndrome (sars) and middle east respiratory syndrome (mers). genetic analyses implicate bats as a natural reservoir of coronaviruses and other animals as potential intermediate hosts in the emergence of sars-cov-2 (andersen et al., 2020) . the sars-cov-2 30 kb genome encodes proteases and an rna-dependent rna polymerase (rdrp) as well as several structural proteins. the sars-cov-2 virion is composed of a helical capsid formed by nucleocapsid (n) proteins bound to the rna genome and an envelope made up of membrane (m) and envelope (e) proteins, coated with trimeric spike (s) proteins (zhou et al., 2020b) . the s protein binds to the ace2 enzyme on the plasma membrane of type 2 pneumocytes and intestinal epithelial cells. after binding, the s protein is cleaved by a host membrane serine protease, tmprss2, facilitating viral entry (hoffmann et al., 2020) . based on our understanding of sars and mers, and their similarity to covid-19, the human immune response in mild cases is likely characterized by a robust type i interferon antiviral response and cd4+ th1 and cd8+ t cell response, resulting in viral clearance. in severe cases, there is likely an initial delay in the antiviral response and subsequently increased production of inflammatory cytokines with an influx of monocytes and neutrophils into the lung, leading to cytokine storm syndrome. these cytokines, including interleukin (il)-1, il-6, il-12, and tumor necrosis factor-ɑ, lead to increased vascular permeability and may contribute to respiratory failure (prompetchara et al., 2020) . another hallmark of severe disease is lymphopenia, which may be due to direct infection of lymphocytes or suppression of bone marrow by the antiviral response. neutralizing igm and igg antibodies to sars-cov-2 can be detected within 2 weeks of infection; it is still unknown whether patients are protected from reinfection (wölfel et al.., 2020; prompetchara et al., 2020) . sars-cov-2 is thought to spread primarily via respiratory droplet and fomite transmission, although the possibility of fecal-oral transmission is being investigated (wölfel et al., 2020) . it can spread over longer distances when aerosolized. once infection is established, the clinical course of covid-19 is variable, making both case identification and triage difficult. notably, asymptomatic and presymptomatic transmission has been described. for those who become symptomatic, the incubation period, the time from exposure to symptom onset, is 4-5 days on average (li et al., 2020) . the most common symptoms include cough, fever, and fatigue. for a minority of patients, the disease worsens approximately 5-10 days after symptom onset, resulting in complications such as acute respiratory distress syndrome (ards) and other end organ failure (zhou et al., 2020a) . patients over 60 and those with comorbid conditions, including cardiovascular disease, underlying respiratory conditions, and cancer, are at higher risk for these severe complications and death. in comparison, children have a milder clinical course (cdc, 2020). reverse transcriptase-polymerase chain reaction of respiratory samples remains the gold standard for diagnosing covid-19, though immunoassays, isothermal nucleic acid amplification tests, and crispr-based diagnostic tests are in development to facilitate rapid point-of-care testing and address global testing shortages (pang et al., 2020) . among those diagnosed, common laboratory findings include lymphopenia, elevated markers of inflammation including c-reactive protein, and elevated markers of coagulation cascade activation including d-dimer; higher viral load and inflammatory marker levels correlate with increased disease severity. chest computed tomography (ct) scans of symptomatic patients are sensitive for detecting disease but nonspecific (cdc, 2020). the current management of covid-19 is focused on infection control, supportive care including ventilatory support as needed, and treatment of sequelae and complications. patients with suspected covid-19 who are asymptomatic or mildly ill are recommended to self-isolate for 2 weeks from the day of exposure, use acetaminophen as needed, remain hydrated, and monitor for worsening symptoms. patients with more severe disease are admitted to the hospital for treatment of hypoxia, respiratory failure, ards, and septic shock. multiple clinical trials are underway to define potential roles for antiviral agents and specific immunomodulators. antiviral agents under investigation include inhibitors endosome maturation (hydroxychloroquine), inhibitors of viral rna-dependent rna polymerase (remdesivir, favipiravir) and inhibitors of viral protein synthesis and maturation (lopinavir/ritonavir); immunomodulators under investigation include interferon-β and blockade of il-6 receptor or il-6 (tocilizumab, siltuximab, sarilumab) (mccreary and pogue, 2020). passive immunization with convalescent plasma and active immunization strategies involving live-attenuated virus, chimeric virus, subunit, nanoparticle, rna, and dna are in development and testing. as the field looks toward the future of covid-19 therapy, temporality of treatment should be considered, as some therapies could show greater efficacy at different disease stages. s.p. is on the scientific advisory board of abpro, inc. lymphopenia (most lymphopenia (most common nding), leukopenia, lymphopenia (most lymphopenia (most common nding), leukopenia, common nding), leukopenia, common nding), leukopenia, common nding), leukopenia, common nding), leukopenia the proximal origin of sars-cov-2 coronavirus disease 2019 (covid-19) sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia coronavirus disease 2019 treatment: a review of early and emerging options potential rapid diagnostics, vaccine and therapeutics for 2019 novel coronavirus (2019-ncov): a systematic review immune responses in covid-19 and potential vaccines: lessons learned from sars and mers epidemic. asian pac virological assessment of hospitalized patients with covid-2019 clinical course and risk factors for mortality of adult inpatients with covid-19 in wuhan, china: a retrospective cohort study a pneumonia outbreak associated with a new coronavirus of probable bat origin key: cord-287815-alv30uk5 authors: mellman, ira; simons, kai title: the golgi complex: in vitro veritas? date: 1992-03-06 journal: cell doi: 10.1016/0092-8674(92)90027-a sha: doc_id: 287815 cord_uid: alv30uk5 nan iplex has proved to be among the more challenging probllems in cell biology. the last several years have turned out ito be particularly exciting in this respect since they have iyielded new insights and ideas at an increasingly rapid ipace. this period of advance has largely been due to the idevelopment of powerful new biochemical, morphological, #and genetic approaches to unraveling the complexities of 'this organelle. while much remains to be discovered, the iproblem now is how to integrate this wealth of information. 'to see if this is possible, we will first summarize how the lslolgi is commonly believed to work and then evaluate the lstrength of the evidence that underlies these views. present view of the golgi 'the golgi complex is essentially a carbohydrate factory. in 0.05, *: p ≤ 0.05, **: p ≤ 0.01, ***: p ≤ 0.001, ****: p ≤ 0.0001. after 48 hr, cell culture supernatants were collected and stored at -80°c. virus titers were determined by plaque assays on vero e6 monolayers greiner bio-one, #662160) and rocked for 1 hr at room temperature. the cells were subsequently overlaid with mem containing 1% cellulose the plaques were visualized by fixation of the cells with a mixture of 10% formaldehyde and 2% methanol (v/v in water) for 2 hr. the monolayer was washed once with pbs and stained with 0.1% crystal violet (millipore sigma # v5265) prepared in 20% ethanol the pennsylvania state university, following the guidelines approved by the institutional biosafety committees. human bronchial epithelial cell air-liquid interface generation and infection human bronchial epithelial cells (hbecs, lonza) were cultured in t75 flasks in plus medium according to manufacturer instructions (stemcell technologies) to generate air-liquid interface (ali) cultures, hbecs were plated on collagen i-coated 24 well transwell inserts with a 0.4-micron pore size (costar, corning) at 5x10 4 cells/ml. cells were maintained for 3-4 days in pneumacult-ex plus medium until confluence, then changed to pneumacult-ali medium triglyceride-rich lipoprotein binding and uptake by heparan sulfate proteoglycan receptors in a crispr/cas9 library of hep3b mutants remdesivir for the treatment of covid-19 -preliminary report guinea fowl coronavirus diversity has phenotypic consequences for glycan and tissue binding heparan sulfate proteoglycans and viral attachment: true receptors or adaptation bias? viruses 11 undersulfated and glycol-split heparins endowed with antiangiogenic activity the 2019 coronavirus (sars-cov-2) surface protein (spike) s1 receptor binding domain undergoes conformational change upon heparin binding identification of a major co-receptor for primary isolates of hiv-1 hiv-1 entry into cd4+ cells is mediated by the chemokine receptor cc-ckr-5 special considerations for proteoglycans and glycosaminoglycans and their purification order out of chaos: assembly of ligand binding sites in heparan sulfate age-dependent modulation of heparan sulfate structure and function bioengineering murine mastocytoma cells to produce anticoagulant heparin ucsf chimerax: meeting modern challenges in visualization and analysis structural analysis of urinary glycosaminoglycans from healthy human subjects human milk oligosaccharides inhibit rotavirus infectivity in vitro and in acutely infected piglets human coronaviruses oc43 and hku1 bind to 9-o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a loss of bcl-6-expressing t follicular helper cells and germinal centers in covid-19 stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis initial step of virus entry: virion binding to cell-surface glycans how good is automated protein docking? the cluspro web server for protein-protein docking appion: an integrated, database-driven pipeline to facilitate em image processing inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans evolutionary differences in glycosaminoglycan fine structure detected by quantitative glycan reductive isotope labeling heparan sulfate structure in mice with genetically modified heparan sulfate production assessing ace2 expression patterns in lung tissues in the pathogenesis of covid-19 angiotensin-converting enzyme 2 is a functional receptor for the sars coronavirus proteoglycans and sulfated glycosaminoglycans sars-cov-2 spike protein binds heparan sulfate in a length-and sequence-dependent manner entry of human coronavirus nl63 into the cell human coronavirus nl63 utilizes heparan sulfate proteoglycans for attachment to target cells stable heparin-producing cell lines derived from the furth murine mastocytoma herpes simplex virus-1 entry into cells mediated by a novel member of the tnf/ngf receptor family heparin inhibits cellular invasion by sars-cov-2: structural dependence of the interaction of the surface protein (spike) s1 receptor binding domain with heparin membrane protein of human coronavirus nl63 is responsible for interaction with the adhesion receptor structures of mers-cov spike glycoprotein in complex with sialoside attachment receptors localisation and distribution of o-acetylated n-acetylneuraminic acids, the endogenous substrates of the hemagglutinin-esterases of murine coronaviruses, in mouse tissue mediation of human immunodeficiency virus type 1 binding by interaction of cell surface heparan sulfate proteoglycans with the v3 region of envelope gp120-gp41 improved vectors and genome-wide libraries for crispr screening relion: implementation of a bayesian approach to cryo-em structure determination cell surface receptors for herpes simplex virus are heparan sulfate proteoglycans a novel role for 3-o-sulfated heparan sulfate in herpes simplex virus 1 entry nidovirus sialate-o-acetylesterases: evolution and substrate specificity of coronaviral and toroviral receptor-destroying enzymes the sweet spot: defining virus-sialic acid interactions automated molecular microscopy: the new leginon system effective inhibition of sars-cov-2 entry by heparin and enoxaparin derivatives. biorxiv the versatile heparin in covid-19 structural basis for human coronavirus attachment to sialic acid receptors the war against influenza: discovery and development of sialidase inhibitors structural characterization of human liver heparan sulfate dog picker and tiltpicker: software tools to facilitate particle selection in single particle electron microscopy function, and antigenicity of the sars-cov-2 spike glycoprotein isolation and characterization of heparan sulfate from various murine tissues site-specific glycan analysis of the sars-cov-2 spike a comprehensive compositional analysis of heparin/heparan sulfate-derived disaccharides from human serum generation of vsv pseudotypes using recombinant deltag-vsv for studies on virus entry, identification of entry inhibitors, and immune responses to vaccines cryo-em structure of the 2019-ncov spike in the prefusion conformation vaccines and therapies in development for sars-cov-2 infections initial interaction of herpes simplex virus with cells is binding to heparan sulfate demystifying heparan sulfate-protein interactions structural basis for the recognition of sars-cov-2 by full-length human ace2 cov-2 spike protein interacts with heparan sulfate and ace2 through the rbd • heparan sulfate promotes spike-ace2 interaction • sars-cov-2 infection is co-dependent on heparan sulfate and ace2 • heparin and non-anticoagulant derivatives block sars-cov-2 binding and infection in brief provide evidence that heparin sulfate is a necessary co-factor for sars-cov-2 infection. they show that heparin sulfate interacts with the receptor binding domain of the sars-cov-2 spike glycoprotein we thank scott selleck (the pennsylvania state university), eugene yeo (uc san diego), john guatelli (uc san diego), mark fuster (uc san diego) and stephen schoenberger (la jolla institute for immunology) for many helpful discussions, and annamaria naggi and giangiacomo torri from the ronzoni institute for generously providing split-glycol heparin. this key: cord-270082-byxd4o4m authors: doheny, kimberly floy; sorger, peter k.; hyman, anthony a.; tugendreich, stuart; spencer, forrest; hieter, philip title: identification of essential components of the s. cerevisiae kinetochore date: 1993-05-21 journal: cell doi: 10.1016/0092-8674(93)90255-o sha: doc_id: 270082 cord_uid: byxd4o4m abstract we have designed and utilized two in vivo assays of kinetochore integrity in s. cerevisiae. one assay detects relaxation of a transcription block formed at centromeres; the other detects an increase in the mitotic stability of a dicentric test chromosome. ctf13-30 and ctf14-42 were identified as putative kinetochore mutants by both assays. ctf14 is identical to ndc10 cbf2 , a recently identified essential gene that encodes a 110 kd kinetochore component. ctf13 is an essential gene that encodes a predicted 478 amino acid protein with no homology to known proteins. ctf13 mutants missegregate chromosomes at permissive temperature and transiently arrest at nonpermissive temperature as large-budded cells with a g2 dna content and a short spindle. antibodies recognizing epitope-tagged ctf13 protein decrease the electrophoretic mobility of a cen dna-protein complex formed in vitro. together, the genetic and biochemical data indicate that ctf13 is an essential kinetochore protein. the term chromosome cycle describes a fundamental aspect of the cell division cycle in which each of the chromosomal dna molecules first is replicated and then undergoes a series of morphological changes and complex movements to ensure its faithful distribution at mitosis. the gene products responsible for execution of the chromosome cycle include structural components, such as those that assemble into the kinetochore, and regulatory components, such as those that establish checkpoints monitoring the proper completion of ordered events within the cell cycle. saccharomyces cerevisiae offers two major advantages as an experimental organism in which to study the chromosome cycle. first, it is possible to combine classical §present address: european molecular biology laboratory, meyerhoff strasse 1, heidelberg 6900-de, federal republic of germany. lipresent address: center for medical genetics, department of medicine, johns hopkins school of medicine, baltimore, maryland 21205. genetics (isolation and phenotypic analysis of mutants) with recombinant genetics (manipulation of cloned dna segments by recombinant dna methods and subsequent reintroduction into the yeast host). second, all of the cisacting dna sequence elements required for chromosome maintenance are cloned and well characterized, including functional centromere dna, chromosomal origins of dna replication, and telomere dna (reviewed by newlon, 1988) . one structure clearly essential to chromosome distribution is the kinetochore (centromere dna and associated proteins), providing the site of attachment of spindle microtubules. the kinetochore is a relatively simple structure in s. cerevisiae in comparison with the large and complex trilaminar structures seen in higher eukaryotes (rieder, 1982; pluta et al., 1990) . in electron microscopic studies of s. cerevisiae chromosomes, one microtubule is seen to interact with each chromatin molecule, but astructurally differentiated kinetochore is not visible (peterson and ris, 1976) . the kinetochore of s. cerevisiae is composed of an approximately 160-220 bp nuclease-resistant core that is centered around the centromere (cfn dna) sequence and flanked by an ordered array of nucleosomes (bloom et al., 1964; funk et al., 1989) . the cen dna sequence requirements for s. cerevisiae have been rigorously and extensively characterized through mutational analysis (reviewed by carbon and clarke, 1990) . approximately 125 bp is sufficient for centromere function (cottarel et al., 1989) . comparison of centromeres from different chromosomes reveals that they consist of three centromere dna sequence elements (cdei, cdeii, and cdeiii) (fitzgerald-hayes et al., 1962; hieter et al., 1985) . cdei (8 bp) and cdeiii (25 bp) exhibit dyad symmetry and are separated by cdeii, a 76-86 bp sequence of over 90% at content. deletions of cdei and cdell reveal that they are important but not essential for chromosome segregation, while single-nucleotide point mutations in cdeiii can completely destroy centromere function. although a great deal is known about cen dna sequence determinants in s. cerevisiae, little is known about the proteins required for kinetochore activity or its regulation within the cell cycle. biochemical purification of kinetochore proteins through sequence-specific affinity purification with cen dna (ng and carbon, 1967; lechner and carbon, 1991) has proven to be difficult, apparently owing to the low abundance of the kinetochore proteins and the requirement of accessory factors for binding in vitro. to date, only one cen dna-binding protein, cpfl (also known as cpl or cbfl), has been extensively characterized (baker and masison, 1990; mellor et al., 1990; cai and davis, 1990) . cpfl is a member of the helix-loophelix family of dna-binding proteins and binds as a homodimer to cdei. a null mutation in cpfl results in only a io-fold decrease in chromosome segregation, indicating that it is important but not essential for kinetochore function. lechner and carbon (1991) have described the isolathe amino-terminal actin orf is represented by the stippled boxes, separated by a line representing the actin intron. the in-frame lacz coding sequence is shown as a hatched box. cen dna is indicated by open boxes, with roman numerals i, ii, and iii indicating cdei, cdeii, and cdeiii, respectively. transcription initiation from gal70 is indicated by the solid arrow, and the length and number of transcripts by the length and width of the dashed arrow. (a) control experiments (cen dna mutation in cis). cen dna transcriptional blocks (wild-type and cdeiii-1% mutant) were tested in a wild-type strain. q-galactosidase activity levels were normalized to 100% with a strain containing the reporter with no cen transcriptional block (top). (6) proposed relaxation of the wild-type cen transcriptional block due to mutation of a kinetochore protein component. perier and carbon (1992) recently described a reporter with cen dna within the gal7 promoter. this situation presumably sets up a competition for binding between kinetochore proteins and transcription initiation factors. the reporter used here is different in that it assays for the relief of a transcriptional block caused by a cen placed downstream of the gal10 promoter. tion of a multicomponent protein-cen dna complex, cbf3, which is defined as an in vitro activity that can bind cen dna sequences in a cdeiii sequence-specific manner. three major protein species of 110 kd, 84 kd, and 58 kd apparent molecular weight are present in approximately equimolaramounts in the most highly purified preparations, although many substoichiometric species are also present (lechner and carbon, 1991) . the cbf3 preparation has recently been shown to exhibit a minus end mechanochemical motor activity in vitro, observed as translocation of a latex bead covalently attached to cen dna along polymerized microtubules (hyman et al., 1992) . classical genetic approaches have also been undertaken to identify s. cerevisiae genes required for chromosome transmission, some of which are expected to encode kinetochore components. several mutant collections have been isolated with the primary criterion of chromosome missegregation, including the ctf(chromosome transmis-sion fidelity; spencer et al., 1990) , ch/(chromosome loss; kouprina et al., 1993), tin (chromosome instability; hoyt et al., 1990) , mcm (minichromosome maintenance; maine et al., 1984) , and m/f (mitotic fidelity; meek+wagner et al., 1988) mutants. these mutants could have defects in any of the many components necessary for the chromosome cycle to proceed with high fidelity. secondary criteria can be applied to identify those mutants defective in a particular structure or process. for example, sensitivity to benomyl (a microtubule-destabilizing drug) was used as a secondary screen for the tin collection to identify mutants involved in microtubule function. this recently resulted in the identification of cln8 and klpl (cin9) (hoyt et al., 1992; saunders and hoyt, 1992; roof et al., 1992) two kinesin-related proteins that are involved in mitotic spindle function. we have designed two secondary screens in order to identify kinetochore mutants. one assay monitors transcriptional readthrough of a centromere, and the other monitors the stability of a test dicentric chromosome. these in vivo assays of the integrity of a test kinetochore were used to screen the cff mutant collection. the cff collection consists of 138 independent mutants that exhibit increased loss of a nonessential chromosome. this collection represents approximately 50 genes whose products are required for high fidelity chromosome transmission in the mitotic cell cycle (spencer et al., 1990) . two mutations, cff13-30 and ctf14-42, tested positive in both secondary screens. we found that ctf14 was identical to ndcloi cbf2, recently shown to encode a 110 kd kinetochore protein (goh and kilmartin, 1993; jiang et al., 1993) . through a combination of genetic and biochemical approaches, we have shown that ctf13 is a previously unidentified essential protein that is a component of the s. cerevisiae kinetochore. readthrough assay and secondary screen of the ctf collection when transcription from a strong promoter is initiated toward a cen dna sequence, the mitotic segregational function of the centromere is destroyed (hill and bloom, 1987) without disruption of its 180-220 bp nuclease protected region (bloom et al., 1984; hill and bloom, 1987) , indicating that at least some of the kinetochore complex remains intact. furthermore, it has been observed that the majority of transcripts terminate at the border of the cen sequence (p. phillipsen, personal communication). these observations suggest that the cen dna-protein complex is responsible for this transcriptional block. in the reporter plasmid used to test this hypothesis (figure l) , the gal70 promoter initiates transcription of an actin-laczfusion gene. a wild-type ceng(185 bp) inserted in the actin intron allowed only 1% of the j3-galactosidase levels seen when no cen was present ( figure 1 ). the structurally dicentric reporter plasmid was maintained in a functionally monocentric state by keeping transformed strains on medium containing galactose. transcription initiated from the gal70 promoter inactivates the segrega tional function of the test cen (hill and bloom, 1987) . to test the hypothesis that a cfn dna-protein complex was responsible for the transcriptional block, we replaced the wild-type ceng sequence with a cen6 sequence containing a single-nucleotide point mutation (cdeiii-15c) in the central element of the palindrome of cdeiii (ccg). this transversion from g to c causes a 250-fold increase in the rate of mitotic missegregation of a chromosome fragment (hegemann et al., 1988; jehn et al., 1991) . similar central element mutations have been shown by in vivo footprinting to result in decreased methylation protection of cen dna (densmore et al., 1991) . the cdeiii-15c cen mutant inserted in the actin intron allowed approximately 20% of the 8-galactosidase levels seen when no cfn was present ( figure 1 ). thus, a cen dna mutation affecting kinetochore integrity caused an increase in transcriptional readthrough that was detectable by increased levels of 8-galactosidase activity. this increase in @galactosidase activity could also be detected as blue colony color when strains were grown on solid medium containing x-gal (see experimental procedures), providing a sensitive visual assay for rapid screening of the ctf collection. we proposed that the transcriptional block provided by the full-length wild-type cfn6 might be relaxed in cff strains with mutant kinetochore proteins. the reporter minichromosomes used in the control experiments would very likely be present in highly variable copy number in these ctf strains, owing to increased rates of nondisjunction and/or loss. this could result in the appearance of false positives and false negatives. to maintain the reporter in single copy, we integrated it into the cff strains. two independent transformants of each cff strain, containing an integrated wild-type cen reporter, were plated on medium containing x-gal and monitored for the appearance of blue colony color (see experimental procedures). of 34 cff mutants screened (see experimental procedures for list), 7 were identified as putative kinetochore mutants because they produced an intermediate level of blue colony color between the levels of the ctf+ strains carrying the wild-type and mutant cen reporters. five of these mutant strains (designated "s" followed by an isolate number) are members of complementation groups, ~10 (ctf7) s9 (ctf8), s30 (ctfl3), ~42 (ctfl4), and s61 (ctfl7) and two contain independent mutations, ~26 and s58 (table 1 ). quantitative measurement of p-galactosidase activity levels in protein extracts from these strains verify the identifi-cation of a relaxed transcriptional block by the colony color assay (table 2) . dicentric chromosome stabilization assay and secondary screen the second assay we developed to screen for kinetochore mutants among the cffcollection is based upon the behavior of dicentric chromosomes as they undergo mitotic segregation. if a chromosome has two centromeres, kinetochores on the same chromatid may become attached to opposite poles of the mitotic spindle ( figure 2a ). when this occurs, the dna molecule usually breaks, and the dicentric chromosome is rapidly lost or is rearranged to a stable form (mann and davis, 1983; haber and thorburn, 1984) . a kinetochore mutant might assemble kinetochores that have a weakened attachment of chromosomal dna to microtubules. this could lead to microtubule detachment before chromatid breakage ( figure 28 ) resulting in stabilization of the dicentric chromosome. the artificial chromosome fragment present in the c?f strains was an appropriate substrate for the construction of a dicentric test chromosome. the chromosome fragment, a nonessential disome, possesses all the sequences required for proper chromosome segregation. its stability can be visually monitored by the degree of colony color sectoring (see figure 38 ) (spencer et al., 1990; shero et al., 1991) , and selective pressure for rearrangement to a stable form is absent because the chromosome fragment is not essential for viability. the gal&en constructs developed in the transcriptional readthrough assay allow control of the mitotic activity of a centromere by the choice of carbon source in the medium. we constructed a vector that would direct integration of these test conditional centromeres to the /eubd 7 locus present approximately 23 kb from the centromere on the chromosome fragment (see experimental procedures). in control experiments, we examined the stability of dicentric chromosome fragments containing either a nearly wild-type (acdei) or a highly defective (cdeiii-15c) secondary conditional cfn ( figures 3a and 38 ). we predicted that upon activation assays were performed on strains grown at 30%. cent, wild-type ceng. b s16(ctf9) was not identified as a putative kinetochore mutant by the transcriptional readthrough assay and serves as a negative control. d s42 (ctf14) is inviable at 30%. in (b), the arrowhead indicates release of the microtubule attachment to the chromosome, allowing the dicentric chromatid to proceed intact to the spindle pole. the hypothesis that a kinetochore mutant might result in thestabilization of a linear dicentric chromosome is based on a previous study of the behavior of dicentric minichromosomes (koshland et al., 1987) . a circular minichromosome carrying a single wild-type centromere is quite stable in s. cerevisiae (maintained in 98%-990/o of the population under selection), whereas a minichromosome with two wild-type centromeres is highly unstable (maintained in only 6% of the population). however, when two identical partially defective centromeres (which by themselves allowed maintenance of minichromosomes in 91% of the population) were placed on the same minichromosome, the plasmid was not destabilized to the same degree (maintained in 49% of the population). in this case, defective kinetochore function was due to mutation of the cea! dna sequence. by analogy, it is possible that kinetochore dysfunction due to a defective or aberrant protein will also result in stabilization of a test dicentric chromosome. of the secondary cen, the dicentric chromosome fragment containing a strong secondary cen would be highly unstable, resulting in a frequent sectoring phenotype, and the dicentric chromosome fragment containing one wild-type and one defective cfnwould be relatively stable, resulting in fewer sectors per colony ( figure 3b ). the actual sectoring phenotypes that resulted, shown in figure 4 , were consistent with our hypothesis, indicating that the dicentric stability assay was a feasible screen for kinetochore mutants among the cff collection ( figure 3c ). to screen the cff mutants for stabilization of a dicentric chromosome, the conditional nearly wild-type cen, acdei, was integrated into the chromosome fragment present in each strain. the strains were maintained on galactose to induce the gal.70 promoter and inactivate the conditional cen (see experimental procedures and figure 3a ). two transformants of each cff strain were streaked onto medium containing dextrose to activate the dicentric state by repressing transcription from the gal70 promoter. the stability of the dicentric chromosome fragment in the crf strains was visually assessed and compared with the stability of the dicentric chromosome frag-ment in the ctf+ strain. if the dicentric chromosome fragment was as unstable as in the wild-type background ( figure 3b ), the cti strain was scored negative in this assay. with 27 cff mutants tested (see experimental procedures for list), 2 exhibited a reduction in sectoring frequency relative to the wild-type control and were thus identified as putative kinetochore mutants: ~30 (ctfl3), and ~42 (ctfl4) (see table 2 ). the sectoring phenotypes exhibited by ~30 (ctfl3) and a representative mutant that was scored negative, ~16 (ctfg), are shown in figure 4 . the sectoring phenotype of ~42 (ctfl4) carrying the dicentric chromosome fragment is similar to that seen for ~30 (ctfl3). the sectoring phenotypes of these mutants are similar to that seen with a test dicentric chromosome carrying a weak secondary cen. two &strains, ~30 (ctfl3) and ~42 (ctfl4), were scored as putative kinetochore mutants in both secondary screens. the ctf74-42 mutation identifies a recently characterized kinetochore component, ndcloicbf2 (see below). we therefore explored further whether the cff13-30 and active on dextrose medium (the test chromosome behaves as a dicentric). (b) the stability of the test dicentric chromosome fragment can be monitored visually. a trna suppressor gene (sup1 1) present on the chromosome fragment partially suppresses the accumulation of red pigment caused by the ade2-101 ochre mutation in our strain backgrounds. if the chromosome fragment is present, the strain is white; if it is lost, the strain is red. thus, the numberof red sectors that develop during the growth of a colony founded by a haploid cell containing the chromosome fragment (white) is indicative of the rate of loss of the chromosome fragment in the strain. the lines within the circles represent the presence of such red sectors in a white colony. the sectoring phenotypes pictured are those predicted and observed (see figure 4 ) when nearly wild-type (cen*v) and highly defective (cen-) cen the centromere originally present on the chromosome fragment is fully wild type (cenw). labels at the left indicate the type and number of cen dnas present on the test chromosome fragment. labels across the top indicate the relevant genotype of the pictured strain. sl6 (ctfg) was scored negative; its sectoring phenotype with the test dicentric chromosome fragment (column 2, row 2) was the same as that seen in the ctf+ background (column 1, row 2). s3g (ctfl3) was scored positive; its sectoring phenotype with the test dicentric chromosome fragment (column 3, row 2) was not as severe as that seen in the ctf+ background (column 1, row 2) and looked similar to that seen with the test dicentric chromosome carrying the weak secondary cen (column 1, row 3). mutation had identified a gene encoding a kinetochore component. molecular cloning cff73-30 is completely deficient for growth at 37%, and this temperature sensitivity was shown to cosegregate with its moderate sectoring phenotype at 25%. ctf73 was cloned by complementation of lethality at 37% (spencer et al., 1988) . a 2.2 kb sau3a fragment that complemented the temperature sensitivity of ctf73-30 was shown to correspond to ctf73 by the directing of an integration event in a heterozygous diploid. this event introduced a prototrophic marker at the genomic site of the cloned dna segment and deleted approximately half of the 2.2 kb genomic sequence (almost the entire ctf73 open reading frame [orf]; see experimental procedures and figure 58 ). when the diploid transformants were dissected, it was found that viability segregated 2:2 (see experimental procedures). we concluded that the cloned dna encodes wild-type ctf73 and that ctf73 is an essential gene in s. cerevisiae. ctf 73 was localized to the right arm of chromosome xiii using both physical and genetic mapping methods ( figure 5a ). from the mapping data, we concluded that ctf73 is a previously unidentified gene in s. cerevisiae. the nucleotide sequence of the 2.2 kb ctf73 clone contains a 1.4 kb orf that encodes for a protein of 478 amino acids with a predicted molecular weight of 58 kd ( figures 5b and 8 ). the ctf73 protein shows no significant overall homology at the amino acid level to entries in gen-bank, genpept, gpupdate, swissprot, pir, embl, and emblupdate data bases as of january 1993. the homology searches were performed on the national center for biotechnology information blast network (altschul et al., 1990) . the ctf7 3 protein contains a short acidic serinerich region (amino acids 200-230) that is approximately 40% identical to the first acidic block found in a mammalian centromere-associated protein, cenp-b (pluta et al., 1992) . the significance of this small region of similarity is unclear, and there are no other significant homologies found outside this area. interestingly, there is a possible cdc28 phosphorylation site in the ctf73 protein (sspss, amino acids 224-228) (figure 8 ). analysis lechner and carbon (1991) have described a multiprotein complex (cbf3) present in nuclear extracts of s. cerevisiae cells that binds in vitro to a 350 bp fragment of cen dna. dna footprinting reveals that cbf3 interacts with the cdeiii sequence element. using a modification of the methods of lechner and carbon (1991) we were able to detect the binding of cbf3 complexes present in wholecell extracts to an 88 bp dna probe that spans cdeiii but lacks cdei and cdeii. to determine whether ctf13 is a component of the cdeiii-binding complex, the ctf13 orf was fused to peptide epitopes against which antibodies had been preby southern hybridization to an etectrophoretic karyotype (spencer et al., 1966; gerring et al., 1990a) . ctf73 was positionally mapped by the method of chromosome fragmentation (gerring et al., 199oa) and was localized to the right arm of chromosome xiii, 475 kb from the right arm telomere and 445 kb from the left arm telomere (see experimental procedures). this physical location was verified by using the 2.2 kb saum fragment to probe a set of filters containing contiguous overlapping 1 clones that cover 66% of the yeast genome (l. riles and m. olson, unpublished data). ctf73 was localized to overlapping clones 4199 and 6643, placing it on an -15 kb segment of dna located on the right arm of chromosome xiii between adh3 and i/v2 the temperature-sensitive cff7530 mutation was meiotically mapped and found to be located 34 cm proximal to the cin4 locus, which agrees well with the physical mapping data (see of ctf13 (see experimental procedures). in the first construct, an 11 amino acid epitope derived from the ha1 protein of influenza virus (field et al., 1988) was inserted in frame into the amino terminus of ctf13 ( figure 5c ). in the second construct, two tandem copies of the el epitope (pluta et al., 1992) , derived from the carboxy-terminal 25 amino acids of an avian coronavirus glycoprotein (machamer and rose, 1987) were placed in frame at the amino terminus of the ctf13 orf under the transcriptional control of the gal7 promoter ( figure 5c ). both epitope-tagged ctf13 derivatives were able to rescue viability in a ctfl3 al::his3 null strain. extracts were prepared from cells carrying either wildtype or epitope-tagged ctf13, reacted with 32p-labeled cdeiii dna, and complexes were resolved on a nondenaturing gel (figure 7) . a single band corresponding to a cdeiii-protein complex was observed (figure 7 , lanes 1, 7, and 10); no complex was observed with a nonfunctional cdeiii variant (data not shown). the addition of antiepitope antibodies to binding reactions containing extracts from cti73 mutant strains rescued by the respective ctfl3-epitope fusion protein resulted in the appearance of a complex with significantly decreased electrophoretic mobility (figure 7, lanes 2-4 and 11 ). this supershift is dna-protein complexes formed with yp-labeled cdeiii probe and whole-cell extracts were analyzed on a nondenaturing acrylamide gel. antibodies were added to preformed complexes, and samples were incubated for 20 min at room temperature before gel analysis. unbound probe was run off the bottom of the gel. lanes 1-6, extracts from cff73d7::h/s3 null cells carrying an ha epitope-ctf13 fusion (see figure 5c ) and incubated with antibodies at various dilutions; lanes 7-9, extracts from cff13dl::h/s3 null cells carrying a ctf13 plasmid reacted with the indicated antibody (controls for lanes l-6); lanes 10-12, extracts from ctf73-30 cells carrying an el epitope-ctf13 fusion (see figure 5c ) and incubated with indicated antibodies (control is lane 6). ha indicates the addition of 12ca5 monoclonal antibody, which is directed against the ha epitope; el indicates the addition of a polyclonal serum directed against the el epitope; peptide indicates the addition of ha peptide to 1 mm prior to the addition of 12ca5 antibody. clearly antibody specific, because antibodies directed against the el epitope did not recognize the ha-ctf13 fusion protein (figure 7, compare lanes 6 and 4) and antibodies directed against the ha epitope did not recognize the el -ctf13 fusion protein (figure 7 , compare lanes 12 and 11). as expected, the supershift was also shown to require the presence of el-ctf13 (figure 7 , compare lanes 6 and 11) or ha-ctf13 (figure 7 , compare lanes 8 and 9 with lanes 3 and 4) and to be competed with ha peptide (figure 7, compare lanes 5 and 3) . these results show that the supershifted band shown in lanes 2-4 and 11 of figure 7 is composed of a complex containing proteins, dna, and antibody. these results demonstrate that ctf13 is present in the protein complex that binds to the essential cdeiii region of s. cerevisiae cfn dna in vitro. because all of the cdeiiiprotein complex formed in our reactions were able to be supershifted by antibodies directed against epitope-tagged ctf13, the stoichiometry of ctf13 and dna in the complexes must be at least 1 to 1. we conclude from these data that ctf13 is a major component of the yeast kinetochore, which, probably in combination with other proteins, interacts with cdeiii. phenotypic analysis the cff73-30 mutant allowed transcriptional readthrough of a test cen and stabilized a test dicentric chromosome fragment. further phenotypic analysis of this mutant revealed defects consistent with defective kinetochore function. the colony color assay for chromosome fragment stability can be used to monitor the rates of chromosome fragment loss and nondisjunction events in diploids. these rates were measured for a ctf13-30 homozygous diploid and its wild-type parent at permissive temperature (25oc). the cti73-30 homozygous diploid exhibited an approximately 50-fold elevation both in the rates of nondisjunction and in loss of the chromosome fragment (table 3a) . the rates of mitotic missegregation and recombination of a suitably marked endogenous chromosome ill were also measured. the mitotic missegregation rate of chromosome ill was elevated 1 o-fold in the ctf73-30 homozygous diploid, while the mitotic recombination rate was only elevated 4-fold (table 38) . we conclude that the cti73-30 mutation confers mitotic segregation and recombination phenotypes consistent with a role in the segregational machinery. cti73-30 causes cells to arrest at the g2/m phase of the cell cycle when shifted to the nonpermissive temperature. flow cytometric analysis of dna content per cell revealed an accumulation of cells with a g2 dna content during log phase growth at the permissive temperature and a single peak of g2 content dna after arrest at the nonpermissive temperature ( figure 8a ). quantitation of cell and nuclear morphology at the permissive temperature also indicated an accumulation of cells with a g2 content; 13% of cells were large budded with the nucleus at the neck in a cff73-30 background, while only 2% of wild-type cells had this morphology ( figure 8c ). cff73-30 is a cdc-like mutation that arrests with a cell morphology indicative of the g2/m preanaphase portion of the cell cycle. after 3 hr at nonpermissive temperature (38x), approximately 80% of cff73-30 homozygous diploid cells had arrested as large-budded cells with an undivided nucleus positioned at or near the neck between the mother and daughter cells. the mitotic spindle was very short in virtually every cell; a medium or long (anaphase b-like) spindle is never seen (figure 86, upper panels) . the cdc arrest is leaky in cff73-30 at 38oc, and the uniform cell morphology decays with time (see figures 88 and 8c ). the mitotic spindle phenotype also becomes less uniform with the appearance of misaligned and aberrantlooking spindles. interestingly, after 5 hr at the nonpermissive temperature, a "cut"-like phenotype is observed in approximately 10% of the population (though present in only 2% of the population at the 2 hr time point). this morphology is reminiscent of the phenotype of schizosaccharomyces pombe cut mutants (hirano et al., 1986) as well as of the phenotype observed in topoisomerase ii mutants of s. cerevisiae (holm et al., 1985) . we define this cut-like cell morphology as a very narrow-necked, large-budded cell in which the nucleus straddling the neck has a pinched appearance (see figure 8b , lower panel). these data demonstrate that the cff73-30 mutation results in a defect revealed in the g2/m phase of the cell cycle, consistent with a defect in kinetochore function. the ctistrain s42 (ctfl4) was also identified by both secondary screens as a putative kinetochore mutant. a clone that complemented the temperature sensitivity of clf74-42 was obtained and mapped to chromosome vii essentially as described for clf73-30 (data not shown). nuclear division cycle 70 (ndcio), recently identified by goh and kil-cells is slightly tighter (a single g2 peak) when incubated at 36% for 3 hr (data not shown). shown above the columns. the numbers shown represent the percentage of total cells scored; small-budded cells with a single nucleus were quantitated (percentage is 100 minus the sum indicated) but are not shown. at 25°c 1500 cells were scored; 200 cells were scored for each time point at 36%. martin (1993) as an essential gene involved in chromosome segregation in s. cerevisiae, is identical to cbf2, a gene recently identified by jiang et al. (1993) that encodes the 110 kd subunit of the cbf3 complex (lechner and carbon, 1991) . multiple internal restriction fragments from the nix70 clone were found to comigrate with fragments from the cti74-42 complementing clone. moreover, the temperature-sensitive mutation cff74-42 failed to complement the temperature sensitivity of n&70-7. we conclude that the cff74-42 mutation is present at the ndc70/ cbf2 locus. thus, the only two cff mutants identified by both secondary screens as putative kinetochore mutants have now been shown to be defective in essential kinetochore components. although the cen dna sequence elements from budding yeast have been cloned for over 10 years (clarke and carbon, 1980) , identification of the genes encoding proteinsessentialforcentromerefunction hasprovendifficult. we describe a genetic approach using two independent in vivo genetic assays to screen an existing large reference set of mitotic segregation mutants (the cffcollection; spencer et al., 1990) for altered kinetochore integrity. in combination, these assays identified two mutant strains, ~30 (ctf 13) and ~42 (ctfl4), as putative kinetochore mutants. biochemical and further phenotypic analysis indicated that the ctf73 gene product was indeed an essential structural component of the kinetochore, and the ctf74 gene product was shown to be identical to ndc70kbf2, a recently characterized essential kinetochore component (goh and kilmartin, 1993; jiang et al., 1993) . mutants theoretically, the transcriptional readthrough assay might result in both false negatives (e.g., kinetochore protein mutations that fail to relieve a transcription block) and false positives(e.g., mutationsthat affect transcriptional regulation or chromatin structure). similarly, a dicentric chromosome could be stabilized by mutants affecting dna metabolism or spindle integrity and assembly. in light of these caveats, we used these assays to screen a set of mutants previously shown to have defects in mitotic chromosome segregation. it is not known how efficient either of these secondary assays would be in a primary screen. our experience suggests that kinetochore mutants can be recognized by the combined phenotype of transcriptional readthrough and dicentric chromosome stabilization. in theory, the degree to which the integrity of the kinetochore must be compromised to result in either of these phenotypes could be quite different. in a simplified view, mutations affecting the interaction of the kinetochore complex with the cen dna should be detected by both assays, while mutations affecting kinetochore to microtubule interactions may only be detected by the dicentric stabilization assay. however, we note that kinetochores participate in several distinct processes in vivo, including microtubule capture, congresaion to the metaphase plate, and poleward migration. in addition, a viable kinetochore mutation would most likely be a leaky mutation, which might exhibit complex consequences following the primary defect. thus, in reality, it is quite possible that some of the mutations identified by only one of these secondary screens indeed disrupt kinetochore integrity but perhaps lead to more subtle alterations than cff73-30 or cff74-42. it is clear that these two screens, whether used alone or in combination, have the potential to aid in the identification of additional regulatory and structural components of the s. cerevisiae kinetochore. they may also be adaptable to other organisms. ctf13 is an esserttial kinetochore compomnt we have presented a combination of in vivo and in vitro evidence demonstrating that the ctf13 protein is a cornponent of the s. cerevisiae kinetochore. in vivo, tf& ctf73-30 mutation confers relaxation of a transcriptional mock mediated by the kinetochore and stabilizes a test dioentric chromosome fragment. in vitro, we demonstrate otat the ctf13 protein is a component of the cen dna-@rot&n complex and, specifically, that it interacts withcdelll. the predicted molecular mass of 56 kd for the ctf13 protein is the approximate size seen on a western blot (data not shown). therefore, the ctf13 protein seemed to be a very good candidate for the 58 kd subunit of the cbf3 complex (lechner and carbon, 1991) , and in fact, the predicted amino-terminal amino acid sequence of the ctf13 protein was found to be identical to tryptic peptide sequence obtained from the purified 58 kd protein component of the cbf3 complex (j. lechner, personal communication). the ctf13 protein appears to be limiting for cdeiiiprotein complex formation in vitro. when extracts derived from a strain overproducing ctf13 are used in the band shift assays (see figure 7 , lanes lo-12), the amount of cdeiii-protein complex formed is increased relative to the amount seen with extracts from nonoverproducing cells (see figure 7 , lanes l-9). also, a cti73 heterozygous diploid strain (cti7347::h/%/ctf13) exhibits a mild but detectable sectoring phenotype. this indicates that the amount of ctf13 protein produced by one copy of the ctf73 locus is not sufficient to keep the fidelity of chromosome segregation at a wild-type level. these observations suggest that ctf13 may be limiting for kinetochore function in vivo. phenotypic analysis of the temperature-sensitive cff73-30 mutation is consistent with a kinetochore defect. cti73-30 causes an increase in the mitotic rate of chromosome missegregation and results in a terminal phenotype indicative of a defect in the g2/m phase of the cell cycle. it has been previously proposed that missegregation mutants will fall into two broad groups: those affecting the pathways of dna metabolism and those affecting the mitotic segregational machinery. a mutation affecting dna metabolism was expected to cause increased rates of both chromosome loss and mitotic recombination, while a mitotic segregation mutant was expected to cause only an increase in chromosomal loss events. phenotypic analysis of a known dna metabolic mutant, dna polymerase a (c&77; hartwell and smith, 1985) , and a known spindle mutant, b-tubulin (r&2; huffaker et al., 1988), supported this model. examination of these phenotypes for cff73-30 strains revealed a significant increase in the rate of chromosomal missegregation with only a very slight elevation in the rate of mitotic recombination (table 38 ). in addition, we now have the ability to distinguish between loss (1:o) and nondisjunction (2:0) missegregation events, and we find that the rates of both of these events are significantly elevated in.& ctf73-30 background (table 3a) . thus a known kinetochore mutation, cti73-30, has been shown to result in phenotypes consistent with previous expectation, and we can perhaps extend this expectation to include a predicted increase in the rates of both chromosomal loss and nondisjunction in mitotic segregation mutants. the ctf73.30 kinetochore defect may be recognized by a cell cycle checkpoint critical events in the cell cycle are temporally ordered and coordinated by a series of dependency pathways in which late events are dependent on the successful completion of earlier events. these dependencies can result from a substrate-product mechanism or from extrinsic control by a monitoring function termed cell cycle checkpoint control (hartwell and weinert, 1989) . checkpoints are responsible for a subset of observed cell cycle arrests or delays. cell cycle arrests or delays associated with kinetochore defects have been reported in several systems. in animal cells, a delay in the initiation of anaphase is correlated with the failure of chromosomes to achieve bipolar attachment to the spindle (rieder and alexander, 1989; zirkle, 1970) and a metaphase arrest is observed with kinetochore disruption by injection of anti-centromere antibodies (bernat et al., 1990) . in yeast, one abberant kinetochore on a single chromosome can cause a mitotic delay (spencer and hieter, 1992) . ctf73-30 strains exhibit a g2/m phase accumulation in logarithmic cultures at permissive temperature and a preanaphase arrest morphology at nonpermissive temperature. cell morphology and dna content do not critically distinguish g2 and m phases in yeast. however, at permissive temperature, crf73-30 strains exhibit a detectable increase in hl kinase activity relative to ctf73 controls, and at nonpermissive temperature, hl kinase activity levels in cff73-30 strains are equivalent to nocodazole-arrested strains (data not shown). thus, hl kinase activity measurements suggest that the cti73-30 mutation causes an accumulation in m phase. it is tempting to speculate that this cell cycle alteration is similar to those described above and that these are a result of checkpoint control exerted in the presence of defective kinetochores. checkpoints are defined by two experimental criteria: first, identification of mutations or conditions that allow bypass of an arrest or delay (resulting in the accumulation of errors) and second, an observed error correction when cell cycle delay is reintroduced experimentally. alternatively, defective substrate-product conversion that becomes rate limiting for progress may also result in cell cycle delay. these alternatives have not been distinguished for the delays seen associated with kinetochore defects. conditional mutations in kinetochore proteins will provide important tools for exploring the relationship be-tween kinetochore structure and cell cycle progression. examination of the terminal phenotype of cti73-30 mutants raises several interesting questions. the fact that the ctf73-30 defect does not lead to a permanent and uniform arrest morphology may simply be a result of the presence of a small amount of active ctf13 protein that eventually allows completion of mitosis, or mitosis may never be completed but cytokinesis may still eventually be attempted in some cells. consistent with the latter possibility, we have observed the accumulation of cells with a cut-like phenotype: 10% of all cells exhibit this phenotype after 5 hr at the nonpermissive temperature. bernat et al. (1990) describe a similar cut-like phenotype after injection of mammalian cells with anti-centromere antibodies and propose that it is a result of the cells' eventual attempt to undergo cytokinesis after prolonged mitotic arrest. because it is not known whether this subset of the cff73-30 population is still capable of dividing, it is unclear whether cytokinesis has trapped nuclei in these cells or whether they are caught undergoing nuclear transits at the time of fixation (palmer et al., 1989) . the terminal phenotype of ~7773-30 is quite different from the terminal phenotype of the other described temperature-sensitive kinetochore mutant, n&70-7 (goh and kilmartin, 1993). n&70-7 mutants exhibit detatchment of the chromosomes from one spindle pole and progression through the cell cycle in the absence of chromosome segregation (most cells produce one aploid daughter and one daughter of increased ploidy). if there is checkpoint control exerted in response to events at the kinetochore, the ndclo-7 defect is not recognized. perhaps this is because the ndc70 protein itself is involved in the recognition and/ or signaling of a dysfunctional complex, or alternatively, checkpoint control may be disabled by complete disruption of kinetochore structure. future experiments addressing the relationships of kinetochore proteins to the control of progression through mitosis should help define important molecular determinants of the temporal order of events in chromosome segregation. will the molecular dissection of the s. cerevisiae kinetochore aid the understanding of kinetochore function in more complex eukaryotes? at this time, analysis of the dna sequence and protein component requirements of the kinetochore is significantly more advanced in s. cerevisiae than in any other eukaryotic organism, although there is great speculation about the relevance of these studies to the understanding of the much larger and morphologically more elaborate kinetochores present in other eukaryotes. while there may be a need for additional components to ensure fidelity in more complex eukaryotes, we think it is probable that the basic mechanisms of the segregational process, including those involved in centromere function, will have been conserved through evolution. a repeat subunit model for the centromere-kinetochore complex has recently been proposed by zinkowski et al. (1991) . this model describes the kinetochore as organized in multiple small repeat units that fold together into a contiguous plate-like structure when condensed at metaphase. zinkowski et al. propose that each unit is capable of microtubule binding and segregational function. in this context, the s. cerevisiae kinetochore, which binds a single microtubule, could represent the simplest ancestral unit of the eukaryotic kinetochore (fitzgerald-hayes et al., 1982; koshland et al., 1987) . identification and characterization of the s. cerevisiae kinetochore components will facilitate the definition of the activities necessary for the completion of proper mitotic segregation in this organism and may well provide substrates for the identification of kinetochore components in other eukaryotic organisms. yeast strains and media the c/f and wild-type parental strains containing chromosome fragments that can be monitored by a visual assay have been previously described (spencer et al., 1990; shero et al., 1991) . the et/collection of 136 originally isolated mutants can be represented in 18 complementation groups and 41 single isolates. all cff mutant isolates that are members of a complementation group retain the original isolate number as an allele number (e.g., 930 contains cff/3-30). one member of each complementation group (the isolate with the most severe sectoring phenotype) and 19 single isolates (those that were his3-) were tested in the two kinetochore screens. media for yeast growth and sporulation were as described (rose et al., 1990 ) except that where sectoring was examined, adenine was added to 6 us/ml to minimal (sd) medium to enhance the development of red pigment in ada2-101 strains. x-gal plates were made as described for synthetic complete (sc) medium (rose et al., 1990) except for the addition of 0.1 m napol(ph 6.6) and 40 pg/ml x-gal (5-bromo4chloro-5indolyl-6-d-galactopyranoside) (use a 20 mg/ml stock in dimethylformamide). all yeast transformations were done by the method of ito et al. (1963) . readthrough assay the reporter construct schematically pictured in figure 1 was modified from pgab (u. vijayraghavan and j. abelson, unpublished data) obtained from r. parker. (pgab is a modification of pyahb2 [vijayraghavan et al., 19661, a cen-arsplasmid containing an actin-/acz fusion gene that has been used extensively to study splicing in s. cerevisiae [vijayraghavan et al., 1966; cellini et al., 19861 mann et al., 1988) . the dna sequence and orientation of the test cen6 sequences in pkf16 were.verified by sequencing (sanger et al., 1977; hattori and sakaki, 1966) ,the resulting plasmids, pkf19 and pkf44, contain wild-type and mutant (cdeiii-15c) cen8 sequences, respectively, in the prientatipn placing cdei 5'to cdeiii. in control experiments, pkfi 6, pkf19, and pkf44 were transformed into the wild-type strain yph102 tiai$ ura3-52 /ys2-801 ada2-101 his3-a200 /eu2-a 1, and p-galactosidase assays were done on independent ura+ transformants (see figure 1 ). the structurally dicentric plasmids (all ycpbo derivatives) were maintained in a functionally monocentric state by keeping transformed strains on medium containing galactose as the carbon source causing transcriptional inactivation of the test centromeres (hill and bloom, 1967) . 6galactosidase assays were performed essentially as described (rose et al., 1990) following the protocol for assay of crude extracts, except that cells were grown in scgal-his liquid medium or scraped off scgal-his plates. the values for optical density at 420 nm were zeroed to an isogenic yeast strain that did not contain a reporter plasmid. for screening of the ctfcollection, the reporter was integrated into chromosome xv as follows: the gal&io-actin-test ceng-/acz fragment described above was inserted into a genomic xhol site immediately 3' to the his3 gene contained on a pbr322-based plasmid, psz62-xbal (mccleod et al., 1966) kindly provided by j. broach. the resulting bamhl fragment containing his3 and the test reporter fragment was transformed into yph276 selecting for replacement of the his3a200 locus on chromosme xv. independent his+ transformants were picked and analyzed by southern blotting toverify insertion of the reporter construct into chromosome xv at the his3 locus. pkf71 contains the wild-type cen6 reporter, and pkf72 contains the mutant cdeiii (15c) cen6 reporter inserted into the his3 bamhl fragment. yph977 and yph976 contain the pkf71-and pkf72derived bamhl fragments, respectively, and were maintained in medium containing galactose. strains were tested for the production of blue color on medium containing the chromogenic substrate of 6galactosidase, x-gal. yph977 colonies appear white (this progresses to a very faint blue color after several days), while yph978 colonies develop a deep blue color. the reporter containing the wild-type cen6 (pkf71) was inserted into chromosome xv in each of the cff mutants as follows. each cff strain was made competent for transformation in sc medium containing 2% galactose and transformed with the bamhl fragment of pkf71. transformants were selected, and the colony was purified on scgal-his plates at 25oc. two independent transformants of each cff strain tested were then plated at a low density (-200 two p679 and two pkf77 transformants of each cff strain tested were streaked onto synthetic complete dextrose plates containing a limiting amount of adenine. the switch to dextrose as a carbon source causes the gal70 promoter to be turned off, resulting in activation of the second conditional centromere and a functionally dicentric chromosome fragment. sectoring phenotypes were directly compared with those of a yph276 p679 or pkf77 transformant streaked onto the same plate. the &strains tested with the stabilization of a dicentric assay were: 659 (ctf2), ~50 (ctf4), ~31 (ctf5), ~53 (ctf6), ~10 (ctfi'), s9 (ctf6), ~16 (ctf9), ~67 (ctfll), ~16 (ctfl2), s30 (ctf13), ~42 (ctfl4), yph960 mata his3-a200 ade2-101 lys2-801 leu2-al cdl&124 cfvll (radld.yptf 275) ura3 sup1 1, ~61 (ctflir), yph961 mata ~6~3-52 lys2-801 adeb 101 his3-a200 trpl-a 1 leula 1 ctf18-160 cflll (cen3.l. yph278) ura3 supil, s3, s4, s12, s17, ~20, ~22, ~41, s.47, ~55, ~56, ~62, ~63, and ~64. ~31 (ctf5) and ~20 were unscorable in this screen because the chromosome fragment present in the p679 derivative strains was extremely unstable. true positives were verified in multiple independent transformants. one source of false positives was transformants containing two chromosome fragments, only one of which carried the conditional secondary centromere. these false positives were easily identified by the demonstration that sectored colonies were his+. for ~30 (ctft3), the phenotype of stabilization of the test dicentric chromosome was shown to be due to a mutation in the ctf73 gene product by transformation of a pkf76 derivative of ~30 with pkf1 i, a prs3lcbased (sikorski and hieter, 1969) plasmid carrying the ctff3 locus. the presence of the wild-type ctf73 gene product resulted in destabilization of the dicentric chromosome fragment back to the level seen in the wild-type parent, yph276. characterization of ctfl3 the 2.2 kb sau3a subclone that rescues the temperature sensitivity of s30 (ctfl3) was inserted into the polylinker of prs314 (sikorski and hieter, 1969) , resulting in pkf1 i, deletion derivatives of the pkft 1 insert were made using existing restriction sites (see figure 5) . a bglll to polylinker deletion, as well as a clal to polylinker deletion, was unable to rescue the temperature sensitivity of ~30 (ctfl3). the dna sequence encoding the entire orf was determined using a set of unidirectional deletions (henikoff, 1967) by standard methods (sanger et at., 1977; hattori and sakaki, 1966) . the sequence of the second strand of the orf was obtained using synthetic oligonucleotides as primers. the ctf13 clone was shown to correspond to the cff73-30 locus, and ctf73 was shown to be an essential gene in s. cerevisiae by using the ctfl3 clone to direct an integration event (sikorski and hieter, 1969) that replaced a majority of the ctf73 orf with vector and his3 sequences. the integration vector, pkf93, was constructed by inserting the -600 bp bglll (polylinker)-bglll fragment and the -200 bp clal-ecori (polylinker) fragment from pkfl1 (see figure 58 ) into the bamhl site and clal-ecori sites of prs303 (sikorski and hieter, 1969) respectively. pkf93 was linearized with ecorl and transformed into a cff73-30btf13 heterozygous diploid strain, yph974, selecting for his+ transformants. integration of pkf93 should delete the ctfl3 orf from amino acid 57 to amino acid 467 (see figure 58 ). approximately half of the his+ diploid transformants obtained exhibited the cff73-30 sectoring phenotype, indicating that the cff73-30 locus was being targeted by pkf93. the integration of pkf93 and deletion of ctf73 sequences was confirmed by southern analysis (data not shown). two sectoring diploid isolates (ctfl3 locus deleted) and two nonsectoring diploid isolates (cff1530 locus deleted) were sporulated, and tetrads were dissected. viability segregated 22 in all 31 tetrads dissected, and all viable spores were his-. all viable spores resulting from the sectoring diploids were temperature sensitive, and all of the viable spores resulting from the nonsectoring diploids were not temperature sensitive. ctf73 was physically mapped by the previously described method of chromosome fragmentation (gerring et al., 199oa) , using the 2.2 kb ctf13 fragment. the sizes of the resulting stable chromosome fragments were determined by orthogonal field-alteration gel electrophoresis (ofage) analysis (carle and olson, 1964) . and assignment of ctf13 to an arm of chromosome xiii was accomplished by hybridization of a left arm telomere-adjacent probe, tub3, to a southern blot of the ofage gel. tub3 was obtained from p. schatz, and the probe used was a 1.2 kb hindlll fragment, radioactively labeled with 'p (feinberg and vogelstein, 1964) . tub3 hybridized to the 445 kb proximal fragment. to obtain a meiotic map position, a diploid strain was constructed that was heterozygous for cff73 and cin4 (~61530/+, +/chc:ura3). the meiotic distance was calculated from the following data by using the formula of perkins: cff73cin4 34 cm (parental ditypel nonparental ditypeitetratype = 42/2/56). ctfl3 was placed proximal to cin4 by probing the ctf73 chromosome fragmentation ofage blots with a 2 kb sacl-kpnl c/n4 fragment obtained from a. hoyt. c/n4 hybridized to the 475 kb distal ctf73 chromosome fragment, placing ctf73 proximal to cin4. analysis the plasmid containing the el tag fused to ctf13, pkf60, was constructed from the base plasmid p414geul (j. kroll, unpublished data). p414geul has a 460 bp gal7 promoter fragment cloned into the kpnl site and two tandem copies of the el tag sequence, described by pluta et al. (1992) , inserted in frame into the apal and xhol sites of prs414 (sikorski and hieter, 1969) . the gal7 promoter directs transcription from its own atg toward the polylinker. an ecorl fragment containing the entire 2.2 kb insert of pkfl1 was cloned into the ecorl site of p414geul in the appropriate transcriptional orientation. the 5' -600 bp of the ctf73containing fragment (up to the bglll site; see figure 58 ) were removed and replaced with an -200 bp polymerase chain reaction product containing sequences from the atg of ctf13 to the bglll site. this allowed the in-frame fusion of the tandem el tags to ctf73 under the transcriptional control of gal7 (see figure 5c ). pkf60 was transformed into yph972 and shown to rescue the temperature sensitivity caused by the ctf73-30 mutation on both galactose-and dextrosecontaining media. the plasmid containing the ha epitope fused to ctf13, psfl97a, was constructed by using a synthetic oligonucleotide to fuse the ha epitope and linker sequences to the amino terminus of ctf13 (see figure 5c ). the fusion protein and -200 bp of 3'noncoding sequence from the ctfl3 locus were cloned into prs315 (sikorski and hieter, 1969) downstream of a 625 bp fragment of 5' flanking dna that is presumed to include the ctf73 promoter. psf197a was transformed into yph975. transformants were streaked onto medium containing 5-fluoroorotic acid to select against the ctfl3-ura3 plasmid (boeke et al., 1967) and it was shown that psf197a would rescue viability in the resulting ctf73al::h/s3 strain. the preparation and analysis of cbf3-dna complexes was performed using a modification of procedures previously described by lechner and carbon (1991) . cells in log phase were harvested by centrifugation, frozen in liquid nitrogen, and mechanically disrupted by fragmentation with a liquid nitrogen-cooled mortar and pestle in 30 mm sodium phosphate (ph 7.0). 60 mm ftglycerophosphate, 1 m kci, 6 mm egta, 6 mm edta, 6 mm naf, 10% glycerol, 1 mm phenylmethytsulfonyi fluoride, and 10 &ml (each) leupeptin, pepstatin, and chymostatin. whole-ceil extract (40 ug) was incubated for 30 min at room temperature with 20 fmol of "p-labeled dna probe, 5 ug of salmon sperm dna, 5 pg of poly(dl-dc), and 10 pg of bovine serum albumin in 30 nl of 10 mm hepes (ph 6.0) 1 mm naf, 6 mm mgci,, 10% glycerol, and kci at a final concentration of 125 mm. the 66 bp dna probe was derived from cen3 and spans the core region of cdeiii, from 5 bp to the left of cdeiii to 59 bp to the right of cdeiii. binding reactions were electrophoresed on 4% polyacrylamide gels as described (ng and carbon, 1967) and visualized by autoradiography. basic local alignment search tool isolation of the gene encoding the saccharomyces cerevisiae centromere-binding protein cpi chromatin conformation of yeast centromeres 5-fluoroerotic acid as a selective agent in yeast molecular genetics yeast centromere binding protein of the helix-loop-helix protein family, is required for chromosome stability and methionine prototrophy we thank c. connelly for significant contributions to this work. we would like to thank ft. parker for sending pgab, j. kroll for allowing us to use p414geu1, and a. pluta for kindly providing us with poly clonal el antibodies.we would like to acknowledge s. holloway, r. sikorski, and n. kouprina for helpful theoretical discussions and h. varmus and t. mitch&on for support during this project. we are also grateful to j. kilmartin, j. carbon, and j. lechner for communicating results prior to publication.we thank d. koshland, w. earnshaw, and t. kelly for critical reading of the manuscript.k. f. d. is a student in the predoctoral training program in human genetics at johns hopkins (national institute of general medical sciences grant p32gm07614). s. t. is supported by the national institutes of health departmental training grant 5t32ca09139. p. k. s. and a. a. h. are biomedical scholars of the lucille p. markey charitable trust. this work was supported by a national institutes of health grant (ca16519) to p. h. and an american cancer society grant (cd-509) to f. s. carbon, j., and clarke, l. (1990) . centromere structure and function in budding and fission yeast. new biologist 2, 10-19. carle, g. f., and olson, m. (1964) . separation of chromosomal dna molecules from yeast by orthogonal-field-alteration gel electrophoresis. nucl. acids res. 12, 5647-5665.cellini, a., parker, r., mcmahon, j., guthrie, c., and rossi, j. (1966) . activation of a tactaac box in the saccharomyces cerevisiae actin intron. mol. cell. biol. 6, 1571 -1576 . clarke, l., and carbon, j. (1960 . isolation of a yeast centromere and construction of functional small circular chromosomes. nature 287, 504-509.cottarel, g., shero, j., hieter, p., and hegemann, j. (1969) . a 125base-pair cen8 dna fragment is sufficient for complete meiotic and mitotic centromere functions in saccharomyces cerevisiae. mol. cell. biol. 9, 3342-3349. densmore, l., payne, w., and fitzgerald-hayes, m. (1991) . in vivo genomic footprint of a yeast centromere. mol. cell. biol. 77, 154-165. feinberg, a. p., and vogelstein, b. (1964) . a technique for radiolabeling dna restriction endonuclease fragments to high specific activity. anal. biochem. 732, 6-13. field, j., nikawa, j., broek, d., macdonald, b., rogers, l., wilson, i., lerner, r., and wigler, m. (1966) . purification of a ras-responsive adenylyl cyclase complex from saccharomyces cerevisiae by use of an epitope addition method. mol. cell. biol. 8, 2159 -2165 . fitzgerald-hayes, m.. clarke, l., and carbon, j. (1962 . nucleotide sequence comparisons and functional analysis of yeast centromere dnas. cell 29,235-244. funk, m., hegemann, j., and philippsen, p. (1969) . mellor, j., jiang, w., funk, m., rathjen, j., barnes, c., hiz, t., hegemann, j., and philippsen, p. (1990) . cpfl, a yeast protein which functions in centromeres and promoters. embo j. 9, 4017-4028. mullis, k. b., and faloona, f. a. (1987) . specific synthesis of dna in vitro via a polymerase catalysed chain reaction. meth. enzymol. 755, 335-350. newlon, c. (1988) . yeast chromosome replication and segregation. microbial. rev. 52, 588-801. ng, r., and carbon, j. (1987) . mutational and in vitro protein-binding studies on centromere dna from saccharomyces cerevisiae. mol. cell, biol. 7, 4522-4534.palmer, r., koval, m., and koshland, d. (1989) . the dynamics of chromosome movement in the budding yeast saccharomyces cerevisiae. j. cell biol. 109, 3355-3388. perier, f., and carbon, j. (1992) . a colony color assay for saccharomyces cefevisiae mutants defective in kinetochore structure and function. genetics 132, 39-51.peterson, j., and ris, h. (1978) . electron microscope study of the spindle and chromosome movement in the yeast s. cerevisiae. j. cell sci. 22, 219-242.pluta, a. f., cooke, c. a., and earnshaw, w. c. (1990) . structure of the human centromere at metaphase. trends biochem. sci. 75, 161-185.pfuta, a. f., saitoh, n., goldberg, i., and earnshaw, w. c. (1992). identification of a subdomain of cenp-b that is necessary and sufficient for focalization to the human centromere.j. cell biol. 716,1081-1093. rieder, c. l. (1982) . the formation, structure and composition of the mammalian kinetochore fiber. int. rev. cytol. 79, l-88. rieder, c. l., and alexander, s. p. (1989) . the attachment of chromosomes to the mitotic spindle and the production of aneuploidy in newt lung cells. in mechanisms of chromosome distribution and aneuploidy, m. resnick and b. vig, eds. (new york: liss), pp. 185-194. roof, d. m., meluh, p. b., and rose m. d. (1992) . kinesin-related proteins required for assembly of the mitotic spindle. j. cell biol. 118, 95-108. rose, m. d., winston, f., and hieter, p. (1990) . methods in yeast genetics (cold spring harbor, new york: cold spring harbor laboratory pressj. sanger, f., nicklen, s., and coulson, a. r. (1977) . dna sequencing with chain-terminating inhibitors. proc. natl. acad. sci. usa 74,5483-5487. saunders, w. s., and hoyt, m. a. (1992) the accession number for the ctf73 sequence reported in this paper is l10083. key: cord-271032-imc6woht authors: schulte-schrepping, jonas; reusch, nico; paclik, daniela; baßler, kevin; schlickeiser, stephan; zhang, bowen; krämer, benjamin; krammer, tobias; brumhard, sophia; bonaguro, lorenzo; de domenico, elena; wendisch, daniel; grasshoff, martin; kapellos, theodore s.; beckstette, michael; pecht, tal; saglam, adem; dietrich, oliver; mei, henrik e.; schulz, axel r.; conrad, claudia; kunkel, désirée; vafadarnejad, ehsan; xu, cheng-jian; horne, arik; herbert, miriam; drews, anna; thibeault, charlotte; pfeiffer, moritz; hippenstiel, stefan; hocke, andreas; müller-redetzky, holger; heim, katrin-moira; machleidt, felix; uhrig, alexander; bosquillon de jarcy, laure; jürgens, linda; stegemann, miriam; glösenkamp, christoph r.; volk, hans-dieter; goffinet, christine; landthaler, markus; wyler, emanuel; georg, philipp; schneider, maria; dang-heine, chantip; neuwinger, nick; kappert, kai; tauber, rudolf; corman, victor; raabe, jan; kaiser, kim melanie; vinh, michael to; rieke, gereon; meisel, christian; ulas, thomas; becker, matthias; geffers, robert; witzenrath, martin; drosten, christian; suttorp, norbert; von kalle, christof; kurth, florian; händler, kristian; schultze, joachim l.; aschenbrenner, anna c.; li, yang; nattermann, jacob; sawitzki, birgit; saliba, antoine-emmanuel; sander, leif erik title: severe covid-19 is marked by a dysregulated myeloid cell compartment date: 2020-08-05 journal: cell doi: 10.1016/j.cell.2020.08.001 sha: doc_id: 271032 cord_uid: imc6woht summary coronavirus disease 2019 (covid-19) is a mild to moderate respiratory tract infection, however, a subset of patients progresses to severe disease and respiratory failure. the mechanism of protective immunity in mild forms and the pathogenesis of severe covid-19, associated with increased neutrophil counts and dysregulated immune responses, remains unclear. in a dual-center, two-cohort study, we combined single-cell rna-sequencing and single-cell proteomics of whole blood and peripheral blood mononuclear cells to determine changes in immune cell composition and activation in mild vs. severe covid-19 (242 samples from 109 individuals) over time. hla-drhicd11chi inflammatory monocytes with an interferon-stimulated gene signature were elevated in mild covid-19. severe covid-19 was marked by occurrence of neutrophil precursors, as evidence of emergency myelopoiesis, dysfunctional mature neutrophils, and hla-drlo monocytes. our study provides detailed insights into the systemic immune response to sars-cov-2 infection and it reveals profound alterations in the myeloid cell compartment associated with severe covid-19. immune responses in blood samples in two independent cohorts of covid-19 patients. 122 activated hla-dr hi cd11c hi cd14 + monocytes were increased in patients with mild covid-123 19, similar to patients with sars-cov-2 negative flu-like illness ('fli'). in contrast, 124 monocytes characterized by low expression of hla-dr, and marker genes indicative of anti-125 inflammatory functions (e.g. cd163, plac8) appeared in patients with severe covid-19. 126 the granulocyte compartment was profoundly altered in severe covid-19, marked by the 127 appearance of neutrophil precursors due to emergency myelopoiesis, dysfunctional 128 neutrophils expressing pd-l1, and exhibiting an impaired oxidative burst response. 129 collectively, our study links highly dysregulated myeloid cell responses to severe j o u r n a l p r e -p r o o f results 131 dual center cohort study to assess immunological alterations in covid-19 patients 132 in order to probe the divergent immune responses in mild vs. severe covid-19, we 133 analyzed blood samples collected from independent patient cohorts at two university medical 134 centers in germany. samples from the berlin cohort (cohort 1) (kurth et al., 2020) , were 135 analyzed by mass cytometry (cytof) and single-cell rna-sequencing (scrna-seq) using a 136 droplet-based single-cell platform (10x chromium), while samples from the bonn cohort 137 (cohort 2) were analyzed by multi-color flow cytometry (mcfc) and on a microwell-based 138 scrna-seq system (bd rhapsody). we analyzed a total of 24 million cells by their protein 139 markers and >328,000 cells by scrna-seq in 242 samples from 53 covid-19 patients and 140 56 controls, including 8 patients with fli ( fig. 1a+b , s1a, table s1 ). 141 we first characterized alterations of the major leukocyte lineages by mass cytometry on 142 whole blood samples from 20 covid-19 patients collected between day 4 and day 29 after 143 symptom onset, and compared them to 10 age-and gender-matched controls and 8 fli 144 patients. we designed two antibody panels to specifically capture alterations in mononuclear 145 leukocytes (lymphocytes, monocytes and dcs, panel 1), and in granulocytes (panel 2, table 146 s2). high-resolution spade analysis was performed with 400 target nodes and individual 147 nodes were aggregated into cell subsets based on lineage-specific markers, such as cd14 148 for monocytes and cd15 for neutrophils (fig. s1b) . uniform manifold approximation and 149 projection (umap) analysis revealed distinct clustering of samples from covid-19 patients, 150 fli, and healthy controls, with marked changes of the monocyte and granulocyte 151 compartment (fig. 1c) . leukocyte lineages were compared in the earliest available samples 152 in covid-19 patients (day 4 to 13), fli, and controls (table s1, fig. 1d ). since leukocyte 153 counts were not available for all control samples, we compared the control samples for 154 cytof ('ctrl cytof') to data from our recently published healthy control cohorts ('ctrl flow') 155 (kverneland et al., 2016; sawitzki et al., 2020) . the proportions of all major lineages were 156 highly similar, irrespective of the methodology (fig. 1d) . cell counts of the published cohort 157 could therefore be used as a reference to report absolute cell counts for leukocyte lineages 158 in covid-19 samples. in line with recent reports xintian et al., 2020) , 159 we observed elevated leukocytes and increased proportions of neutrophils in patients with 160 severe covid-19 ( fig. 1d) , whereas only proportional increases in neutrophils were evident 161 in fli and mild covid-19 patients (fig. 1d) . total lymphocytes and t cells were strongly 162 reduced in all covid-19 and fli patients, whereas non-classical monocytes were 163 specifically depleted in covid-19 (fig. 1d) . increased neutrophils in severe covid-19 and 164 loss of non-classical monocytes in both mild and severe disease, were validated in cohort 2 165 by mcfc (fig. s1c, table s1+3 ). given the dramatic changes in various immune cell populations (fig. 1c+d) , we next 171 assessed their composition and activation state by droplet-based scrna-seq in 27 samples 172 from 18 covid-19 patients (8 mild & 10 severe, cohort 1, table s1 ) collected between day 173 3 and day 20 after symptom onset. a total of 48,266 single-cell transcriptomes of pbmc 174 were analyzed together with 50,783 pbmc from publicly available control datasets (21 175 control donors, table s1 ). umap and high-resolution cell type classification identified all cell 176 types expected in the mononuclear compartment of blood with a high granularity in the 177 monocytes, identifying five distinct clusters (cluster 0-4) (fig. 2a+s2a, table s4 ). 178 monocytes in clusters 0-3 expressed cd14, cluster 4 comprised the non-classical 179 monocytes marked by fcgr3a (encoding cd16a) and low expression of cd14. separate 180 visualization of cells in mild and severe cases revealed highly disease severity-specific 181 clusters (fig. 2b) . a distinct subset of cd14 + monocytes (cluster 1)( fig. 2a ) marked by high 182 expression of hla-dra, hla-drb1 and co-stimulatory molecule cd83 (fig. s2d) , 183 engagement of which has been linked to prolonged expansion of antigen-specific t cells 184 (hirano et al., 2006) , was selectively detected in mild covid-19 (fig. 2c ). in addition, we 185 identified another closely related cd14 + hla-dr hi monocyte population (cluster 2), which 186 was characterized by high expression of ifn-stimulated genes (isgs). however, upon closer 187 analysis, this cluster was found to originate from a single donor with mild covid-19 (fig . 188 2a-c, fig. s2d ). both cluster 1 and cluster 2 expressed high levels of isgs ifi6 and isg15 189 (fig. s2d) . in patients with severe covid-19, monocytes showed low expression of dr and high expression of alarmins s100a8/9/12 (cluster 3, fig. 2a-c, fig. s2d ). the most 191 prominent change in severe covid-19 was the appearance of two distinct cell populations 192 (cluster 5+6), absent in pbmc of patients with mild covid-19 and control donors ( fig. 2a) . 193 published markers (kwok et al., 2020; ng et al., 2019) identified cluster 5 and 6 as 194 neutrophils and immature neutrophils, respectively (fig. 2a+b) . immature neutrophils 195 (cluster 6) expressed cd24, pglyrp1, defa3 and defa4, whereas neutrophil cluster 5 196 expressed fcgr3b (cd16b), cxcl8, and lcn2 (lipocalin 2) (fig. 2c, fig. s2a ). their 197 migration within the pbmc fraction on a density gradient marked these cells as low-density 198 neutrophils (ldn). 199 in the second cohort, pbmc from 17 covid-19 patients (8 mild, 9 severe, table s1), 200 sampled between 2 and 25 days after symptom onset, and 13 controls, were collected for 201 scrna-seq on a microwell-based platform (bd rhapsody). high-quality single-cell 202 transcriptomes for 139,848 pbmc were assessed and their population structure was 203 visualized using umap (fig. 2d, table s4 ). data-driven cell type classification (aran et al., 204 2019) and cluster-specific marker gene expression identified all cell types expected in the 205 pbmc compartment and revealed additional clusters and substructures (fig. 2d+s2b) . 206 similar to cohort 1, monocytes exhibited significant plasticity and were subclassified into 5 207 clusters (fig. 2d , clusters 0-4). disease severity-associated changes seen in cohort 1 were 208 validated in cohort 2 (fig. 2e) . immature and mature neutrophil clusters were detected in 209 both cohorts (clusters 5-6) and showed near identical marker gene expression (fig. 2c) . similar to cohort 1, a prominent shift in subpopulation occupancy was observed in the 211 monocyte clusters (fig. 2d+e) . 212 based on the union of the top 50 genes for monocyte and neutrophil clusters, we found a 213 high correlation between the independently defined functional states within the monocyte 214 j o u r n a l p r e -p r o o f compartment, and mature and immature neutrophils in cohort 1 and cohort 2 (fig. s2c) . 215 violin plot representation of important marker genes illustrated distinct phenotypic states and 216 underscored the high similarity of the two cohorts (fig. s2d) . 217 disease-severity dependent alterations of the monocyte compartment and the appearance 218 of two ldn populations were detected in two cohorts of covid-19 patients. 219 predominance of hla-dr hi cd11c hi inflammatory monocytes in mild and hla-220 dr lo cd11c lo cd226 + cd69 + monocytes in severe covid-19 221 the monocyte compartment is particularly affected by covid-19, indicated by a loss of 222 cd14 lo cd16 hi non-classical monocytes (fig. 1c+d) . disease severity-dependent shifts in 223 monocyte activation were identified by scrna-seq (fig. 2) . we further explored the 224 phenotypic alterations of the monocyte compartment using mass cytometry ( fig. 3a,c,d) . increased levels of activated hla-dr hi cd11c hi monocytes in mild covid-19 249 patients was confirmed by mcfc in cohort 2 (fig. 3e) . in severe covid-19, we detected 250 increased expression of cd226 and cd69 (cluster 10) and/or decreased expression of hla-251 dr, and total cd226 + cd69 + monocytes were elevated compared to controls. cluster 10 252 expressed high levels of cd10, which is induced during macrophage differentiation (huang 253 et al., 2020b). thus, an alternative activation pattern of classical monocytes appeared to be 254 covid-19 specific and was associated with severe disease. besides activated lymphocytes, 255 also monocytes upregulate cd69 expression (davison et al., 2017) , which promotes tissue 256 infiltration and retention (cibrián and sánchez-madrid, 2017) . similarly, cd226 expression 257 on alternatively activated monocytes might also promote diapedesis and tissue infiltration 258 j o u r n a l p r e -p r o o f (reymond et al., 2004) . together, this activation pattern may contribute to the reduction of 259 circulating monocytes in covid-19. 260 261 hla-dr lo monocytes persist in severe covid-19 262 next, we dissected covid-19 associated phenotypic alterations of monocytes by scrna-263 seq. marker genes of the monocyte clusters derived from fig. 2a showed that mild covid-264 19 associated clusters 1 and 2 were characterized by an isg-driven transcriptional program 265 (fig. s3a) , and gene ontology enrichment analysis (goea) assigned these clusters to 'type 266 i interferon signaling pathway' (fig. s3b) . a monocyte cluster marked by low expression of 267 hla-dr and high expression of s100a12 and cxcl8 (cluster 3, hla-dr lo s100a hi ) was 268 strongly associated with severe covid-19 (fig. s3a, 2b, s2d) . for further in-depth 269 analysis, we subclustered the monocyte compartment of the pbmc dataset of cohort 2 ( fig. 270 2d, s3c, table s1 ) resulting in 7 subclusters (fig. 4a) . cluster 1 was marked by high 271 expression of hla-dra and hla-drb1 and co-stimulatory molecule cd83 and was 272 therefore designated hla-dr hi cd83 hi activated inflammatory monocytes (fig. 4a+b , 273 s3d+e). we identified two major clusters (0, 2) and a smaller cluster 6 with low hla-dr 274 expression, which were associated with severe covid-19 (fig. 4b, s3d+e) . low hla-dr 275 expression is an established surrogate marker of monocyte dysfunction (venet et al., 2020) 276 which results in reduced responsiveness to microbial stimuli (veglia et al., 2018) , suggesting 277 that cluster 0 and 6 are composed of dysfunctional monocytes. genes of the s100a family 278 were expressed in both hla-dr lo clusters (fig. 4b ), albeit to a higher degree in cluster 0 279 (hla-dr lo s100a hi , e.g. s100a12, fig. s2d , s3e, well as pre-maturation markers like mpo and plac8 (fig. 4b) , recently linked to immature 283 monocyte states in sepsis patients (reyes et al., 2020) . in line with these findings, clusters 284 0, 2 and 6 were significantly enriched in a gene signature derived from sepsis-associated 285 monocytes ( fig. 4c ) (reyes et al., 2020) . moreover, blood monocytes isolated from covid-286 19 patients showed a blunted cytokine response to lps stimulation, particularly monocytes 287 from patients with severe covid-19 (fig. 4d) . accordingly, hla-dr lo monocyte clusters (0, 288 2, 6) were detected almost exclusively in severe covid-19 (fig. 4e) . we next analyzed 289 time-dependent cluster occupancies per patient in cohort 2 (fig. 4e+f) . activated hla-290 dr hi cd83 hi monocytes (cluster 1) were found in all cases of mild covid-19, even at late 291 time points (fig. 4e+f) . in contrast, hla-dr lo cd163 hi monocytes (cluster 2) were present 292 mainly early in severe disease, while hla-dr lo s100a hi monocytes (cluster 0) dominated the 293 late phase of disease (fig. 4e+f) . violin plots of isg (fig. s3d ) and visualization of marker 294 genes ( fig. s3e) indicated differential expression patterns of ifn signature genes in 295 individual monocyte clusters. to reveal the kinetics of isg expression, we plotted the 296 expression of isg15 and ifi6 in the complete monocyte population for all patients that had 297 been sampled at least twice (fig. 4g) . expression levels were highest at early time points 298 and consistently decreased over time, clearly indicating that the ifn response in covid-19 299 is inversely linked to disease severity and time (fig. s3f+g) . in contrast, decreased 300 expression of hla-dra and hla-drb1 in severe covid-19 is evident early on and 301 sustained over time. 302 j o u r n a l p r e -p r o o f transcription factor prediction indicated a stat signaling-driven gene expression program 303 in monocytes in covid-19 (fig. 4h) neutrophils and the remaining clusters as mature neutrophils (fig. s4a) . accordingly, pro-322 and pre-neutrophils were enriched for transcriptional signatures of neutrophil progenitors 323 derived from published single-cell data ( and pro-neutrophils in cluster 4 and 6 showed the highest proportion of cells with a 325 proliferative signature (fig. s4b) . clusters 0, 1, 2 (originally in cluster 4 in fig. 2a ) 326 expressed mature neutrophil markers fcgr3b (cd16) and mme (cd10) (fig. s4a) . including cd24, olfm4, lcn2, and bpi, previously associated with poor outcome in sepsis 341 (fig. 5b, s4a ) (kangelaris et al., 2015) . 342 all ldns also expressed high levels of alarmins s100a8 and s100a9 (fig. 5d) , whereas 343 other s100 genes (e.g. s100a4, s100a12) were strongly induced in selected neutrophil 344 alterations of the neutrophil compartment were further interrogated by mass cytometry of 362 whole blood samples of covid-19 patients (n=8 mild + 9 severe, cohort 1), fli patients 363 (n=8), and age-and gender-matched controls (n=9) (table s1), using a panel designed to 364 detect myeloid cell maturation and activation states as well as markers of 365 immunosuppression or dysfunction (table s2) . unsupervised clustering analysis of all 366 neutrophils in all samples revealed 10 major clusters (fig. 6a ) of immature (cluster 2, 5, 6, 367 7), mature (cluster 1, 3, 4) and remaining clusters of low abundancy (cluster 8, 9, 10). based 368 on their differential expression of cd11b, cd16, cd24, cd34 and cd38, clusters 5 and 6 369 were identified as pro-neutrophils and cluster 2 as pre-neutrophils (kwok et al., 2020; ng et 370 al., 2019). the fourth immature cell cluster (7) showed very low expression of cd11b and 371 cd16, reminiscent of pro-neutrophils, but lacking cd34, cd38 and cd24 (fig. 6a) , 372 suggesting a hitherto unappreciated pro-neutrophil-like population. the mature neutrophils 373 segregated into non-activated (cluster 1), partially activated (cluster 3) and highly activated 374 cells (cluster 4), based on the loss of cd62l and upregulation of cd64, as well as signs of 375 proliferative activity (ki67 + ) (fig. 6a) . 376 neutrophils from covid-19 patients clearly separated from those of controls and also fli 377 patients in umap analysis (fig. 6b) , and neutrophils in patients with severe covid-19 were 378 distinct from those of patients with mild disease (fig. 6b) . cells from control donors 379 accumulated in areas enriched for mature non-activated cells (cluster 1) and immature pre-380 neutrophil-like cells (cluster 2). in contrast, neutrophils from fli patients were mainly mature 381 non-activated (cluster 1) and mature highly activated (cluster 4) cells. neutrophils from 382 covid-19, particularly from patients with severe disease primarily occupied immature pre-383 and pro-neutrophil-like clusters. plotting cell cluster-specific surface marker expression onto 384 the umaps (fig. 6c ) as well as statistical analyses of cell cluster distribution and surface 385 marker expression among different patient groups supported these observations (fig. 386 6d+e) . samples from fli patients contain a high proportion of highly activated mature 387 neutrophils, but barely any immature neutrophils. in contrast, severe covid-19 is 388 associated with the appearance of immature pre-and pro-neutrophils (fig. 6d+6e) . 389 interestingly, immature cell clusters in severe covid-19 showed signs of recent activation 390 like upregulation of cd64 (mortaz et al., 2018) , rank and rankl (riegel et al., 2012) , as 391 well as reduced cd62l expression (mortaz et al., 2018) . in addition to loss of cd62l, 392 immature and mature neutrophils from severe covid-19 showed elevated pd-l1 393 expression compared to control samples (fig. 6e) we next assessed the dynamics of the changes within the myeloid cell compartment over 405 time. we grouped samples according to collection time as 'early' (within the first 10 days) or 406 late (during the following 20 days) after onset of symptoms. in both cohorts, we observed a 407 tendency towards (cohort 1) or significantly higher (cohort 2) proportions of granulocytes in 408 severe vs. mild covid-19 patients, both at early and late time points (fig. s5a) . we 409 observed a persistent release of immature neutrophils (e.g. cluster 6) in severe covid-19 410 (fig. s5b) showing high expression of cd64 and pd-l1, but downregulation of cd62l as a 411 sign of activation, dysfunction and immunosuppression (fig. s5c ). in addition, severe 412 covid-19 patients show further increased frequencies of mature, partially activated 413 neutrophils (cluster 3) at later time periods (fig. s5b) . thus, the neutrophil compartment of 414 severe covid-19 patients is characterized by a combination of persistent signs of 415 inflammation and immunosuppression, which is reminiscent of long-term post-traumatic 416 complications (hesselink et al., 2019) . 417 we also analyzed time-dependent phenotypic changes in the monocyte compartment by 418 mass cytometry. non-classical monocytes started to recover in covid-19 patients during 419 the later stages of the disease (fig. s5a) . hla-dr hi cd11c hi monocyte cell clusters also 420 declined at later time points in mild covid-19 ( fig. s5d,e,f) , which correlates well with the 421 longitudinal changes of ifi6 and isg15 as well as hla-dra, and hla-drb1 expression 422 profiles (fig. 4g+s3f) . in contrast, overall proportions of hla-dr hi cd11c hi monocytes in 423 severe covid-19 remained low throughout the course of the disease. proportions of cd10 hi 424 macrophage-like cluster 10 and cd226 + cd69 + monocytes were generally higher at later 425 stages in severe covid-19, which resembled the kinetics of hla-dr lo s100a hi monocytes 426 identified by scrna-seq (fig. 4f ). this indicates a prolonged alternative activation of 427 monocytes in severe covid-19 (fig. s5e) . table s1 ). integrated visualization of 435 all samples of cohort 2 (fresh/frozen pbmc, fresh whole blood, 229,731 cells, fig. s6a ) 436 revealed the expected blood leukocyte distribution, including granulocytes ( fig. 7a, s6a , 437 table s4 ). cell type distribution identified by scrna-seq profiles (fig. s6b ) strongly 438 correlated with mcfc characterization of the same samples (fig. s6c) . for further analysis 439 of the granulocyte compartment, we first combined the whole blood samples with the fresh 440 pbmc to guide the clustering of all major immune cells resulting in a total of 122,954 cells 441 (fig 7a) . from these samples, we identified all neutrophil clusters and extracted the cells 442 derived from whole blood for subsampling, which revealed a structure of 9 clusters 443 (n=58,383 cells, fig. 7b+c ). 444 using marker-and data-driven approaches as applied to ldn (fig. 5d, s4a) , we identified 445 fut4(cd15) + cd63 + cd66b + pro-neutrophils, itgam(cd11b) + cd101 + pre-neutrophils, along 446 with 7 mature neutrophil clusters ( fig. 7b -d, s6d, table s4 ). heterogeneous expression of 447 various markers involved in mature neutrophil function including cxcr2, fcgr2a (cd32), 448 fcgr1a (cd64) or mme (cd10), pointed towards distinct functionalities within the 449 neutrophil compartment (fig. 7e, s6d+e) . seven of the nine 9 neutrophil clusters identified 450 in whole blood in cohort 2, could also be mapped to the fresh pbmc transcriptomes in 451 cohort 1 (fig. s6f) , indicating that scrna-seq of fresh pbmc in covid-19 patients reveals 452 relevant parts of the neutrophil space. the transcriptional phenotype of pro-and pre-453 neutrophils (cluster 8+9) was corroborated in cohort 2 ( fig. 7b-d, s6d) . 454 heatmap and umap visualization of the cell type distribution identified pro-and pre-455 neutrophils mainly at late time points in severe covid-19 ( fig. 7f-g) . furthermore, mature 456 neutrophils with a high ifn-signature (cluster 1) were associated with severe covid-19 457 (fig. 7e, s6g ). this cluster was also enriched for markers identified by cytof as 458 differentially expressed in patients with severe covid-19 ( fig. 6) , such as elevated 459 expression of cd274 (pd-l1) and fcgr1a (cd64) (fig. 7h ). in addition to cd274, cells in 460 cluster 1 expressed genes indicative of a potentially suppressive or anti-inflammatory state, 461 including zc3h12a (fig. 7e) , which is known to suppress hepatitis c virus replication and 462 virus-induced pro-inflammatory cytokine production (lin et al., 2014) . cluster 2 was also 463 enriched for cells from covid-19 patients, mainly from severe but also mild cases (fig. 7f is mainly driven by e2f family members and pre-neutrophils mainly depend on ets tfs 475 (fig. s6h) . 476 pseudotime analysis strongly supported the differentiation trajectory from pro-neutrophils 477 (cluster 8) via pre-neutrophils (cluster 6) to mature neutrophils in cluster 2 and 1 ( fig. s6i-j) . 478 particularly cd274 (pd-1l) was enriched in cluster 1 compared to cluster 2, supporting the 479 potential of neutrophils to progress towards a suppressive phenotype in severe covid-19 480 (fig. s6j) . interestingly, cd177 is expressed in pre-neutrophils and persisting in cluster 1 481 further highlighting the newly emerging character of this cluster (volkmann et al., 2020) . 482 finally, we studied whether the persistent emergence of immature, potentially dysfunctional 483 neutrophils in severe covid-19 patients can be captured under routine diagnostic 484 conditions. therefore, samples of 32 covid-19 patients ( table s1 , cohort 1) were 485 characterized by routine hematology analyses using a clinical flow cytometry system 486 (sysmex analyzer). indeed, the assumption of rescue myelopoiesis in severe covid-19 was 487 supported by significantly higher counts in the population of immature granulocytes (ig, 488 representing promyelocytes, myelocytes, and metamyelocytes) in this patient group ( fig. 489 7k). we also found significant differences in the neutrophil compartment, when analyzing 490 the width of dispersion with respect to granularity, activity, and cell volume defined as ne-491 wx, ne-wy and ne-wz, respectively. as compared to patients with mild course, severely ill 492 patients displayed increases in width of dispersion of activity and cell volume as surrogates 493 for increased cellular heterogeneity, immaturity and dysregulation in severe covid-19 ( fig. 494 7k), resembling previously described alterations in sepsis patients (stiel et al., 2016) . 495 furthermore, neutrophils of severe covid-19 patients were partially dysfunctional as their 496 oxidative burst upon stimulation with standardized stimuli (e.coli or pma) was strongly 497 impaired in comparison to control and mild covid-19 neutrophils, whereas phagocytic 498 activity was preserved (fig. 7l , table s1 ). 499 collectively, the neutrophil compartment in peripheral blood of severe covid-19 patients is 500 characterized by the appearance of ldn, fut4(cd15) + cd63 + cd66b + pro-neutrophils, and 501 itgam(cd11b) + cd101 + pre-neutrophils, reminiscent of emergency myelopoiesis, as well as 502 cd274(pd-l1) + zc3h12a + mature neutrophils reminiscent of gmdsc-like cells, which might 503 exert suppressive or anti-inflammatory functions. 504 dysfunctional phenotype, plac8 was recently shown to suppress production of il-1β and il-559 18 (segawa et al., 2018) . in fact, we observed that inflammatory cytokine production, 560 including il-1β release, was impaired in monocytes from patients with severe covid-19 561 (fig. 4) . pbmc fractions in severe covid-19 contained immature neutrophils, including pro-and pre-574 neutrophils, which was not observed in mild cases (fig. 5) . these immature ldn showed a 575 surface marker and gene expression profile reminiscent of granulocytic mdscs including 576 genes such as s100a12, s100a9, mmp8, arg1 (uhel et al., 2017) , and olfm4, which has 577 been recently associated with immunopathogenesis in sepsis (alder et al., 2017) . 578 emergence of pro-neutrophils in severe covid-19 was also detected by single-cell 579 proteomics on whole blood samples. strikingly, both immature and the mature neutrophils 580 showed increased expression of cd64 and pd-l1 (fig. 6+s5 ), similar to recently described 581 alterations in sepsis (meghraoui-kheddar et al., 2020). in addition to the altered phenotype, 582 we also observed an altered functionality. neutrophils from patients with severe covid-19 583 showed an impaired oxidative burst response, while their phagocytic capacity was preserved 584 (fig. 7) . 585 single-cell transcriptomics of whole blood samples revealed mature activated neutrophils in 586 both mild and severe covid-19 (fig. 7b, cluster 2) , however, expression of cd274 (pd-l1) 587 was only found in severe covid-19 (cluster 1), and it increased in later stages of the 588 disease. expression of pd-l1 on neutrophils has been associated with t cell suppression 589 (bowers et methodology: js-s, dp, tk, sb, lb, edd, mg, dw, mb, tsk, as, od, hm, ars, cc, dk, ev, 664 cjx, ad, ct, sh, clg, ml, ew, tu, mb, rg, (table s4) . (table s4) . within the monocyte space of cohort 1 (related to fig. 2, table s4 ). cluster ranked by adjusted p-values. 894 c, back-mapping of monocyte clusters of cohort 2 (fig. 4c) onto the pbmc umap of cohort 895 2 (fig. 2d) . the legend shows the association of the colors to the clusters together with the 896 labeling of the clusters based on expressed marker genes (according to fig. 2 and fig. 897 s3d-f). 898 d, violin plots of marker gene expression in the monocyte clusters identified in the complete 899 pbmc space of cohort 2 (fig. 2c,d ) 900 e, dot plot of the top 10 marker genes sorted by average log fold change calculated for the 901 monocyte clusters (fig. 4c) . severe covid-19 patients ( figure 1a+b, table s1 ). information on age, sex, medication, 1020 and co-morbidities is listed in covid-19 patients ( figure 1a+b , table s1 ) were included in the study. in patients who 1030 were not able to consent at the time of study enrollment, consent was obtained after 1031 recovery. information on age, sex, medication, and co-morbidities are listed in table s1 . 1032 covid after one wash in dpbs cells were directly processed for scrna-seq (bd rhapsody) or 1058 multi-color flow cytometry (mcfc). frozen pbmc were recovered by rapidly thawing frozen 1059 cell suspensions in a 37°c water bath followed by immediate dilution in pre-warmed rpmi-1060 1640+10% fbs (gibco) and centrifugation at 300g for 5min. after centrifugation, the cells 1061 were resuspended in rpmi-1640+10% fbs and processed for scrna-seq. antibody 1062 cocktails were cryopreserved as described before (schulz et al., 2019) . 1063 all anti-human antibodies pre-conjugated to metal isotopes were obtained from fluidigm 1066 corporation (san francisco, us). all remaining antibodies were obtained from the indicated 1067 companies as purified antibodies and in-house conjugation was done using the maxpar x8 1068 labeling kit (fluidigm). tlrpure; innaxon, uk). after stimulation, cell-free supernatants were collected and tested 1160 for il-1β, ifnγ, and tnfα, respectively, using the cytokine bead assay legend-plex 1161 mix&match test was used to report differences in ig count, whereas mixed-effect-analysis and sidak's 1182 multiple comparison test was applied to report statistical differences of ne-wx, ne-wy and 1183 ne-wz between mild and severe covid-19 patients. 1184 1185 10x genomics chromium single-cell rna-seq 1186 pbmc were isolated and prepared as described above. afterwards, patient samples were 1187 hashtagged with totalseq-a antibodies (biolegend) according to the manufacturer's protocol 1188 for totalseq tm -a antibodies and cell hashing with 10x single cell 3' reagent kit v3.1. 50µl 1189 cell suspension with 1x10 6 cells were resuspended in staining buffer (2% bsa, jackson 1190 immuno research; 0.01% tween-20, sigma-aldrich; 1x dpbs, gibco) and 5 µl human 1191 trustain fcx tm fcblocking (biolegend) reagent were added. the blocking was performed 1192 for 10min at 4°c. in the next step 1µg unique totalseq-a antibody was added to each 1193 sample and incubated for 30min at 4°c. after the incubation time 1.5ml staining buffer were 1194 added and centrifuged for 5min at 350g and 4°c. washing was repeated for a total of 3 1195 washes. subsequently, the cells were resuspended in an appropriate volume of 1x dpbs 1196 (gibco), passed through a 40µm mesh (flowmi tm cell strainer, merck) and counted, using a 1197 neubauer hemocytometer (marienfeld). cell counts were adjusted and hashtagged cells 1198 were pooled equally. the cell suspension was super-loaded, with 50,000 cells, in the 1199 chromium tm controller for partitioning single cells into nanoliter-scale gel bead-in-1200 emulsions (gems). single cell 3' reagent kit v3.1 was used for reverse transcription, cdna 1201 amplification and library construction of the gene expression libraries (10x genomics) 1202 following the detailed protocol provided by 10x genomics. hashtag libraries were prepared 1203 according to the cell hashing protocol for 10x single cell 3' reagent kit v3.1 provided by 1204 biolegend, including primer sequences and reagent specifications. biometra trio thermal 1205 cycler was used for amplification and incubation steps (analytik jena). libraries were 1206 quantified by qubit tm 2.0 fluorometer (thermofisher) and quality was checked using 2100 1207 bioanalyzer with high sensitivity dna kit (agilent). sequencing was performed in paired-end 1208 mode with a s1 and s2 flow cell (2× 50 cycles) using novaseq 6000 sequencer (illumina). the qubit dsdna hs kit (thermofisher) and the size-distribution was measured using the 1229 agilent high sensitivity d5000 assay on a tapestation 4200 system (agilent technologies). 1230 sequencing was performed in paired-end mode (2*75 cycles) on a novaseq 6000 and 1231 nextseq 500 system (illumina) with novaseq 6000 s2 reagent kit (200 cycles) and 1232 nextseq 500/550 high output kit v2.5 (150 cycles) chemistry, respectively. 1233 data pre-processing of 10x genomics chromium scrna-seq data 1236 cellranger v3.1.0 (10x genomics) was used to process scrna-seq. to generate a digital 1237 gene expression (dge) matrix for each sample, we mapped their reads to a combined 1238 reference of grch38 genome and sars-cov-2 genome and recorded the number of umis 1239 for each gene in each cell. 1240 1241 data pre-processing of bd rhapsody scrna-seq data 1242 after demultiplexing of bcl files using bcl2fastq2 v2.20 from illumina and quality control, 1243 paired-end scrna-seq reads were filtered for valid cell barcodes using the barcode whitelist 1244 provided by bd. cutadapt 1.16 was then used to trim nexterape-pe adapter sequences 1245 where needed and to filter reads for a phred score of 20 or above (martin, 2011) . then, 1246 star 2.6.1b was used for alignment against the gencode v27 reference genome (dobin et 1247 al implemented in seurat. 1258 we excluded cells based on the following quality criteria: more than 25% mitochondrial 1260 reads, more than 25% hba/hbb gene reads, less than 250 expressed genes or more than 1261 5,000 expressed genes and less than 500 detected transcripts. we further excluded genes 1262 that were expressed in less than five cells. in addition, mitochondrial genes have been 1263 excluded from further analysis. 1264 lognormalization (seurat function) was applied before downstream analysis. the original 1266 gene counts for each cell were normalized by total umi counts, multiplied by 10,000 (tp10k) 1267 and then log transformed by log10(tp10k+1). 1268 after normalization, the count data was scaled regressing for total umi counts and principal 1270 component analysis (pca) was performed based on the 2,000 most variable features 1271 identified using the vst method implemented in seurat. subsequently, the scrna-seq data 1272 from cohort 1 was integrated with publicly available 10x scrnaseq data of healthy controls 1273 using the 'harmony' algorithm (korsunsky et benchmarking data from healthy controls and 10x v3.1 scrna-seq data from cohort 1). we 1277 downloaded the count matrices for the publicly available scrna-seq data and filtered the 1278 cells using the above-mentioned quality criteria prior to data integration. for two-dimensional 1279 data visualization we performed umap based on the first 20 dimensions of the 'harmony' 1280 data reduction. the cells were clustered using the louvain algorithm based on the first 20 1281 'harmony" dimensions with a resolution of 0.4. 1282 differential expression (de) tests were performed using findmarkers/findallmarkers 1284 functions in seurat with wilcoxon rank sum test. genes with >0.25 log-fold changes, at 1285 least 25% expressed in tested groups, and bonferroni-corrected p-values<0.05 were 1286 regarded as significantly differentially expressed genes (degs). cluster marker genes were 1287 identified by applying the de tests for upregulated genes between cells in one cluster to all 1288 other clusters in the dataset. top ranked genes (by log-fold changes) from each cluster of 1289 interest were extracted for further illustration. the exact number and definition of samples 1290 used in the analysis are specified in the legend of fig. 2a and summarized in table s1 . 1291 clusters were annotated based on a double-checking strategy: 1) by comparing cluster 1293 marker genes with public sources, and 2) by directly visualizing the expression pattern of 1294 cytof marker genes. 1295 significant degs between each monocyte cluster and the rest of monocyte subpopulations 1297 were identified by findmarkers function from the seurat package using wilcoxon rank sum 1298 test statistics for genes expressed in at least 25% of all monocyte clusters. p-values were 1299 corrected for multiple testing using bonferroni correction and genes with corrected p-values 1300 lower or equal 0.05 have been taken as significant degs for go enrichment test by r 1301 package/clusterprofiler v.3.10.1 (yu et al., 2012) . 1302 to systematically compare the similarity of marker gene expression in the identified 1305 monocyte/neutrophils subpopulations between the two cohorts, the spearman correlation 1306 coefficients were calculated based on the union of the top 50 marker genes of each cluster 1307 sorted by fold change in the two cohorts, based on their average expression of all cells in the 1308 specific subpopulation. the pairwise comparisons were performed, and the correlation 1309 coefficients were displayed using a heatmap. 1310 the neutrophil space was investigated by subsetting the pbmc dataset to those clusters 1312 identified as neutrophils and immature neutrophils (cluster 5 and 6). within those subsets, 1313 we selected top 2,000 variable genes and repeated the clustering using the snn-graph 1314 based louvain algorithm mentioned above with a resolution of 0.6. the dimensionality of the 1315 data was then reduced to 10 pcs, which served as input for the umap calculation. to 1316 categorize the observed neutrophil clusters into the respective cell cycle states, we applied 1317 the cellcyclescoring function of seurat and visualized the results as pie charts. 1318 a gene signature enrichment analysis using the 'aucell' method (aibar et al., 2017) was 1319 applied to link observed neutrophil clusters to existing studies and neutrophils of cohort 2. 1320 we set the threshold for the calculation of the area under the curve (auc) to marker genes 1321 from collected publications and top 30 of the ranked maker genes from each of neutrophil 1322 clusters from cohort 2. the resulting auc values were normalized the maximum possible 1323 auc to 1 and subsequently visualized in violin plots or umap plots. 1324 1325 scrna-seq umi count matrices were imported to r 3.6.2 and gene expression data 1328 analysis was performed using the r/seurat package 3.1.2 (butler et al., 2018) . 1329 demultiplexing of cells was performed using the htodemux function implemented in 1330 seurat. after identification of singlets, cells with more than 25% mitochondrial reads, less 1331 than 250 expressed genes or more than 5,000 expressed genes and less than 500 detected 1332 transcripts were excluded from the analysis and only those genes present in more than 5 1333 cells were considered for downstream analysis. the following normalization, scaling and 1334 dimensionality reduction steps were performed independently for each of the data subsets 1335 used for the different analyses as indicated respectively. in general, gene expression values 1336 were normalized by total umi counts per cell, multiplied by 10,000 (tp10k) and then log 1337 transformed by log10(tp10k+1). subsequently, the data was scaled, centered and 1338 regressed against the number of detected transcripts per cell to correct for heterogeneity 1339 associated with differences in sequencing depth. for dimensionality reduction, pca was 1340 performed on the top 2,000 variable genes identified using the vst method implemented in 1341 seurat. subsequently, umap was used for two-dimensional representation of the data 1342 structure. cell type annotation was based on the respective clustering results combined with 1343 data-driven cell type classification algorithms based on reference transcriptome data (aran 1344 et al., 2019) and expression of known marker genes. 1345 scrna-seq count data of 229,731 cells derived from fresh and frozen pbmc samples 1348 purified by density gradient centrifugation and whole blood after erythrocyte lysis of cohort 2 1349 (bonn, bd rhapsody) were combined, normalized and scaled as described above (see fig. 1350 s6a). after variable gene selection and pca, umap was performed based on the first 20 1351 principal components (pcs). no batch correction or data integration strategies were applied 1352 to the data. visualization of the cells (fig. s6a) showed overlay of cells of the same type 1353 (e.g. t cells clustered within the same cluster, irrespective of cell isolation procedure). in 1354 other words, cell type distribution was unaffected by the technical differences in sample 1355 handling. data quality and information content was visualized as violin plots showing the 1356 number of detected genes, transcripts (umis) and genic reads per sample handling strategy 1357 split by pbmc and granulocyte fraction. 1358 scrna-seq count data of 139,848 cells derived from fresh and frozen pbmc samples of 1360 cohort 2 (bonn, bd rhapsody) purified by density gradient centrifugation were normalized 1361 and scaled as described above. after variable gene selection and pca, umap was 1362 performed and the cells were clustered using the louvain algorithm based on the first 20 1363 pcs and a resolution of 0.4. cluster identities were determined by reference-based cell 1364 classification and inference of cluster-specific marker genes using the wilcoxon rank sum 1365 test using the following cutoffs: genes have to be expressed in more than 20% of the cells of 1366 the respective cluster, exceed a logarithmic fold change cutoff to at least 0.2, and exhibited a 1367 difference of >10% in the detection between two clusters. the exact number and definition of 1368 samples used in the analysis are specified in the legend of fig. 2d and summarized in 1369 table s1 . 1370 to compare shifts in the monocyte and neutrophil populations in the pbmc compartment of 1373 covid-19 patients, the percentages of the cellular subsets -as identified by clustering and 1374 cluster annotation explained above for the two independent scrna-seq data sets (cohort 1 1375 and cohort 2) -of the total number of pbmc in each data set were quantified per sample and 1376 visualized together in box plots. to determine the statistical significance of differences in cell 1377 proportions between the different conditions, a dirichlet regression model was used, due to 1378 the fact that the proportions are not independent of one another. the r/rdirichletreg 1379 (maier, 2014) package was used. the p-values were corrected for multiple testing using the 1380 benjamini-hochberg procedure. 1381 the monocyte space was investigated by subsetting the pbmc dataset to those clusters 1383 identified as monocytes (cluster 0-4), removing cells with strong multi-lineage marker 1384 expressions, and repeating the variable gene selection (top 2,000 variable genes), 1385 regression for the number of umis and scaling as described above. the dimensionality of 1386 the data was then reduced to 8 pcs, which served as input for the umap calculation. the 1387 snn-graph based louvain clustering of the monocytes was performed using a resolution of 1388 0.2. marker genes per cluster were calculated using the wilcoxon rank sum test using the 1389 following cutoffs: genes have to be expressed in >20% of the cells, exceed a logarithmic fold 1390 change cutoff to at least 0.25, and exhibited a difference of >10% in the detection between 1391 two clusters. the exact number and definition of samples used in the analysis are specified 1392 in the legend of fig. 4a and summarized in table s1 . 1393 j o u r n a l p r e -p r o o f for each patient and time point of sample collection, the proportional occupancy of the 1395 monocyte clusters was calculated, and the relative proportions were subsequently visualized 1396 as a function of time. 1397 scrna-seq count data derived from fresh pbmc samples purified by density gradient 1399 centrifugation and whole blood after erythrocyte lysis of cohort 2 (bd rhapsody) were 1400 normalized, scaled, and regressed for the number of umi per cell as described above. after 1401 pca based on the top 2,000 variable genes, umap was performed using the first 30 pcs. 1402 cell clusters were determined using louvain clustering implemented in seurat based on the 1403 first 30 principle components and a resolution of 0.8. cluster identities were assigned as 1404 detailed above using reference-based classification and marker gene expression. 1405 subsequently, the dataset was subsetted for whole blood samples after erythrocyte lysis and 1406 clusters identified as neutrophils and immature neutrophils, and re-scaled and regressed. 1407 after pca on the top 2,000 variable genes, the neutrophil subset data was further processed 1408 using the data integration approach implemented in seurat (stuart et al., 2019) based on the 1409 first 30 pcs removing potential technical biases of separate experimental runs. umap and 1410 clustering were performed as described above on the top 12 pcs using a resolution of 0.3. 1411 differentially expressed genes between clusters were defined using a wilcoxon rank sum 1412 test for differential gene expression implemented in seurat. genes had to be expressed in 1413 >10% of the cells of a cluster, exceed a logarithmic threshold >0.1. the exact number and 1414 definition of samples used in the analysis are specified in the legend of fig. 7a and 1415 summarized in table s1 . 1416 after cell type classification of the combined scrna-seq data set of fresh pbmc and whole 1419 blood samples of cohort 2 described above, 89,883 cells derived from whole blood samples 1420 after erythrocyte lysis were subsetted. percentages of cell subsets in those whole blood 1421 samples of the total number of cells were quantified per sample and visualized in box plots 1422 separated by disease stage and group. 1423 to categorize the cells within the neutrophil clusters into the respective cell cycle states, we 1434 applied the cellcyclescoring function of seurat and visualized the results as pie charts. 1435 trajectory analysis was performed using the destiny algorithm v3.0.1 (angerer et al., 2016) . generate umap representations all events of a given population of interest were down-1473 sampled to 70,000 cells and then embedded using the tumap function (r uwot package, 1474 https://cran.r-project.org/package=uwot) parameterized by local neighborhood 50, 1475 learning rate 0.5, and using the indicated markers ( 3 8 11 14 22 24 3 8 14 6 10 12 13 17 13 22 7 11 16 8 13 13 18 15 20 6 7 6 8 11 7 8 12 5 8 11 5 7 11 19 9 12 16 17 19 23 9 16 9 16 8 15 13 stat3 fkbp5 lgals9 ifitm3 ifit2 isg15 ifi27 mx2 ifi6 ifit1 herc5 oasl mx1 ifih1 ifi44 ifi44l oas2 serpinb1 il1r2 serping1 cd163 rnase1 ifi16 oas3 adar lgals3bp spi1 defa3 defa4 hsp90aa1 mpo elane prtn3 cd24 bpi cd63 clec5a fut4 hexa pde4d c1qbp ceacam8 anxa1 gsn clec12a nlrc4 olfm4 cybb lcn2 lgals3 ltf mmp8 hp cd101 camp s100a8 s100a12 cd177 tspo rab27a s100p itgam s100a9 s100a6 ifi6 isg15 ly6e ifi16 gbp1 ccr1 c3ar1 ifih1 ddx58 fcgr1a aim2 zc3h12a tlr2 abca1 icam1 inpp4b fbxl17 slc38a1 clec2d itga4 sell s100a11 cxcr2 tlr5 clec4e fcgr2a adam8 slc11a1 nlrp12 tlr4 c5ar1 coro1a cmtm6 tnfrsf1a tnfrsf1b rac1 nlrp3 ptprc ptgs2 sirpa ncoa4 mme s100a4 % exp. transcriptome meta-analysis deciphers a 1524 dysregulation in immune response-associated gene signatures during sepsis scenic: single-cell 1528 regulatory network inference and clustering is a candidate marker for a pathogenic neutrophil subset in septic shock destiny: diffusion maps for large-scale single-cell data in r reference-based analysis of lung single-cell 1536 sequencing reveals a transitional profibrotic macrophage gene ontology: tool for the unification of 1539 biology the pathogenicity of sars-cov-2 in hace2 transgenic mice targeting potential 1545 drivers of covid-19: neutrophil extracellular traps myeloid-derived suppressor cell subsets drive glioblastoma growth 1548 in a sex-specific manner human intestinal pro-inflammatory cd11chighccr2+cx3cr1+ 1552 macrophages, but not their tolerogenic cd11c-ccr2-cx3cr1-counterparts, are expanded 1553 in inflammatory bowel disease article immune suppression by neutrophils in hiv-1 infection: role of pd-l1/pd-1 1556 pathway presence of sars-cov-2 reactive t 1559 cells in covid-19 patients and healthy donors l-arginine 1561 metabolism in myeloid cells controls t-lymphocyte functions recommendations for 1564 myeloid-derived suppressor cell nomenclature and characterization standards integrating single-1567 cell transcriptomic data across different conditions, technologies, and species the gene ontology resource: 20 years and 1571 still going strong deciphering myeloid-derived suppressor cells: 1574 isolation and markers in humans, mice and non-human primates neutrophils which migrate to lymph nodes modulate cd4+ t cell response by a pd-l1 1578 dependent mechanism clinical and immunologic features in severe and moderate coronavirus 1581 disease covid-19 severity correlates with airway 1584 epithelium-immune cell interactions identified by single-cell analysis ccl2 promotes colorectal carcinogenesis by enhancing 1588 polymorphonuclear myeloid-derived suppressor cell population and function cd69: from activation marker to metabolic 1591 gatekeeper from mice to monkeys, animals studied for coronavirus answers mafb determines human macrophage anti-inflammatory polarization: relevance for the 1597 pathogenic mechanisms operating in multicentric carpotarsal osteolysis neutrophils with myeloid derived 1601 suppressor function deplete arginine and constrain t cell function in septic shock patients platelet, monocyte and neutrophil activation and glucose tolerance in 1605 favorable anakinra responses in severe covid-19 patients with secondary 1609 star: ultrafast universal rna-seq aligner genomewide association study of severe 1615 covid-19 with respiratory failure cd163 1618 expression defines specific, irf8-dependent, immune-modulatory macrophages in the bone 1619 marrow complex immune dysregulation in covid-19 patients with severe respiratory failure targets of t cell responses to 1626 sars-cov-2 coronavirus in humans with covid-19 disease and unexposed individuals complex heatmaps reveal patterns and 1629 correlations in multidimensional genomic data impaired type i interferon activity and 1632 exacerbated inflammatory responses in severe covid-19 patients normalization and variance stabilization of single-cell 1635 rna-seq data using regularized negative binomial regression neutrophil heterogeneity and its role in infectious 1639 complications after severe trauma engagement of cd83 ligand induces prolonged 1643 expansion of cd8+ t cells and preferential enrichment for antigen specificity stroke-induced 1647 immunodepression and dysphagia independently predict stroke-associated pneumonia -1648 the predict study dexamethasone in hospitalized patients with covid-1651 19 -preliminary report simultaneous inference in general parametric 1653 models clinical features of patients infected with 2019 novel coronavirus in wuhan induced cd10 expression during monocyte-to-macrophage differentiation 1659 identifies a unique subset of macrophages in pancreatic ductal adenocarcinoma should we stimulate or suppress immune responses in covid-19? 1663 cytokine and anti-cytokine interventions gene list to a gene regulatory network using large motif and track collections heterogeneity among septic shock patients in a 1670 set of immunoregulatory markers human suppressive neutrophils cd16 bright /cd62l dim exhibit decreased 1673 adhesion toward understanding the origin and evolution of cellular organisms increased expression of neutrophil-related genes 1678 in patients with early sepsis-induced ards suppress lymphocyte proliferation through expression of pd-l1 incidence of thrombotic complications in critically ill icu patients with covid-19 fast, sensitive and accurate 1689 integration of single-cell data with harmony web-based analysis and publication of 1691 flow cytometry experiments immunologic 1695 perturbations in severe covid-19/sars-cov-2 infection studying the 1698 pathophysiology of coronavirus disease 2019: a protocol for the berlin prospective covid-1699 19 patient cohort (pa-covid-19) age and gender leucocytes 1702 variances and references values generated using the standardized one-study protocol combinatorial single-cell analyses of granulocyte-monocyte 1706 progenitor heterogeneity reveals an early uni-potent neutrophil progenitor spleen-derived ifn-γ induces generation of pd-l1 + -suppressive neutrophils during 1710 endotoxemia immunophenotyping of covid-19 and influenza highlights the role 1713 of type i interferons in development of severe covid-19 least-squares means: the r package lsmeans pad4 mediated histone hypercitrullination induces heterochromatin decondensation and 1718 chromatin unfolding to form neutrophil extracellular trap-like structures arginine deficiency is involved in thrombocytopenia and 1722 immunosuppression in severe fever with thrombocytopenia syndrome single-cell landscape of bronchoalveolar immune cells in patients with covid-1726 19 the molecular signatures database hallmark gene set collection mcpip1 suppresses hepatitis c virus 1732 replication and negatively regulates virus-induced proinflammatory cytokine responses dysregulated myelopoiesis and 1735 hematopoietic function following acute physiologic insult antibody responses to sars-cov-2 in patients with covid-1738 19 longitudinal analyses reveal immunological misfiring 1741 in severe covid-19 ebola virus disease is 1744 characterized by poor activation and reduced levels of circulating cd16+ monocytes single cell rna sequencing of human liver 1748 reveals distinct intrahepatic macrophage populations dirichletreg: dirichlet regression for compositional data in r cutadapt removes adapter sequences from high-throughput sequencing 1752 reads validation of diagnostic gene sets to identify critically ill patients with sepsis deep immune profiling of 1758 covid-19 patients reveals patient heterogeneity and distinct immunotypes with implications 1759 for therapeutic interventions the innate immune system: fighting on the front 1761 lines or fanning the flames of covid-19? two new 1764 immature and dysfunctional neutrophil cell subsets define a predictive signature of sepsis 1765 useable in clinical practice barcoding of live human peripheral blood mononuclear cells for multiplexed mass cytometry platinum-conjugated antibodies for 1770 application in mass cytometry the cd14+hla-drlo/neg 1772 monocyte: an immunosuppressive phenotype that restrains responses to cancer 1773 immunotherapy pathological inflammation in patients with covid-19: a 1775 key role for monocytes and macrophages ultra-high-throughput 1778 clinical proteomics reveals classifiers of covid-19 infection frequencies of circulating mdsc 1782 correlate with clinical outcome of melanoma patients treated with ipilimumab neutrophil extracellular traps 1786 (nets) contribute to immunothrombosis in covid-19 acute respiratory distress 1787 syndrome persisting low monocyte human leukocyte 1790 antigen-dr expression predicts mortality in septic shock update on 1793 neutrophil function in severe inflammation different phenotypes of non-classical monocytes associated with 1797 systemic inflammation, endothelial alteration and hepatic compromise in patients with 1798 dengue heterogeneity of neutrophils detection of sars-cov-2-specific humoral and cellular immunity in covid-1803 19 convalescent individuals cytof workflow: differential discovery in high-1806 throughput high-dimensional cytometry datasets a comprehensive single cell transcriptional landscape 1809 of human hematopoietic progenitors immunopathogenesis of coronavirus infections: 1811 implications for sars biological basis and pathological 1813 relevance of microvascular thrombosis a subset of neutrophils in human 1816 systemic inflammation inhibits t cell responses through mac-1 decoding human fetal liver 1819 haematopoiesis extracting a cellular hierarchy from high-1822 dimensional cytometry data with spade mortality rates of patients with covid-19 in 1824 the intensive care unit: a systematic review of the emerging literature immunotherapies for 1827 covid-19: lessons learned from sepsis an immune-cell signature of 1830 bacterial sepsis dnam-1 and pvr regulate monocyte 1833 migration through endothelial junctions human polymorphonuclear neutrophils express rank and are activated by its 1836 ligand, rankl immunosuppression for hyperinflammation in 1838 covid-19: a double-edged sword? convergent antibody responses to 1841 sars-cov-2 in convalescent individuals differential redistribution of activated monocyte and dendritic cell subsets to the 1845 lung associates with severity of covid-19 hepatic acute-phase proteins control innate immune 1849 responses during infection by promoting myeloid-derived suppressor cell function invariant nkt cells reduce the 1853 immunosuppressive activity of influenza a virus-induced myeloid-derived suppressor cells in 1854 mice and humans regulatory cell therapy in kidney 1857 transplantation (the one study): a harmonised design and analysis of seven non-1858 randomised, single-arm, phase 1/2a trials human neutrophils in the 1860 saga of cellular heterogeneity: insights and open questions myeloperoxidase can differentiate between sepsis and non-infectious sirs and predicts 1863 mortality in intensive care patients with sirs emerging principles in myelopoiesis at 1866 homeostasis and during infection and inflammation surface barcoding of live pbmc for multiplexed mass 1868 cytometry minimizing batch effects in mass 1871 production through regulation of autophagy and is associated with adult still disease neutrophil 1877 diversity in health and disease neutrophil fluorescence: a new indicator of cell activation 1880 during septic shock-induced disseminated intravascular coagulation neutrophil activation during septic shock comprehensive integration of single-cell 1886 data human cd62ldim neutrophils 1889 identified as a separate subset by proteome profiling and in vivo pulse-chase labeling interleukin-3 receptor in acute leukemia leukocyte protease binding to nucleic acids promotes nuclear 1895 localization and cleavage of nucleic acid binding proteins type i ifn immunoprofiling in covid-1898 19 patients early expansion of circulating granulocytic 1901 myeloid-derived suppressor cells predicts development of nosocomial infections in patients 1902 with sepsis myeloid-derived suppressor cells coming 1904 of age review-article myeloid cells in sepsis-1906 acquired immunodeficiency expression of dnam-1 1908 (cd226) on inflammatory monocytes kidney injury enhances renal g-1911 csf expression and modulates granulopoiesis and human neutrophil cd177 in vivo clinical characteristics of 138 hospitalized patients with coronavirus-infected pneumonia in wuhan, china dysregulation of the immune response 1917 affects the outcome of critical covid-19 patients a single-cell atlas of the peripheral 1920 immune response in patients with severe covid-19 pathological findings of covid-19 associated with acute respiratory distress 1926 syndrome increased formation of neutrophil extracellular traps is associated with gut 1929 leakage in patients with type 1 but not type 2 diabetes myeloid-derived suppressor cells: their 1931 role in the pathophysiology of hematologic malignancies and potential as therapeutic targets clusterprofiler: an r package for 1934 comparing biological themes among gene clusters clinical course and risk factors for mortality of adult inpatients with covid china: a retrospective cohort study overly exuberant innate immune response to sars-cov-2 infection. 1940 ssrn electron a dynamic immune response shapes covid-19 1943 progression neutrophil extracellular traps in covid-19 cov-2 infection induces profound alterations of the myeloid compartment • mild covid-19 is marked by inflammatory hla-dr hi cd11c hi cd14 + monocytes • dysfunctional hla-dr lo cd163 hi and hla-dr lo s100a hi cd14 + monocytes in severe • emergency myelopoiesis with immature and dysfunctional neutrophils in severe covid-19 analysis of patients with with mild and severe covid-19 reveals the presence of dysfunctional neutrophils in the latter that is linked to emergency myelopoiesis in brief, the neutrophil space was subsetted to only severe patients (early and late) and only 1438 the most prominent clusters of the latter (clusters 1,2,6,8). the normalized data were scaled 1439 and regressed for umis and a diffusion map was calculated based on the top 2,000 variable 1440 genes with a sum of at least 10 counts over all cells. based on the diffusion map, a diffusion 1441 pseudo time was calculated to infer a transition probability between the different cell states 1442 of the neutrophils. subsequently, the density of the clusters along the pseudotime and 1443 marker gene expression for each cluster were visualized. 1444 enrichment of gene sets was performed using the 'aucell' method (aibar et al., 2017 (aibar et al., ) 1445 implemented in the package (version 1.4.1) in r. we set the threshold for the calculation of 1446 the auc to the top 3% of the ranked genes and normalized the maximum possible auc to 1. 1447the resulting auc values were subsequently visualized in violin plots or umap plots. key: cord-294429-isivkz8b authors: grifoni, alba; weiskopf, daniela; ramirez, sydney i.; mateus, jose; dan, jennifer m.; moderbacher, carolyn rydyznski; rawlings, stephen a.; sutherland, aaron; premkumar, lakshmanane; jadi, ramesh s.; marrama, daniel; de silva, aravinda m.; frazier, april; carlin, aaron; greenbaum, jason a.; peters, bjoern; krammer, florian; smith, davey m.; crotty, shane; sette, alessandro title: targets of t cell responses to sars-cov-2 coronavirus in humans with covid-19 disease and unexposed individuals date: 2020-05-20 journal: cell doi: 10.1016/j.cell.2020.05.015 sha: doc_id: 294429 cord_uid: isivkz8b summary understanding adaptive immunity to sars-cov-2 is important for vaccine development, interpreting coronavirus disease 2019 (covid-19) pathogenesis, and calibration of pandemic control measures. using hla class i and ii predicted peptide ‘megapools’, circulating sars-cov-2−specific cd8+ and cd4+ t cells were identified in ∼70% and 100% of covid-19 convalescent patients, respectively. cd4+ t cell responses to spike, the main target of most vaccine efforts, were robust and correlated with the magnitude of the anti-sars-cov-2 igg and iga titers. the m, spike and n proteins each accounted for 11-27% of the total cd4+ response, with additional responses commonly targeting nsp3, nsp4, orf3a and orf8, among others. for cd8+ t cells, spike and m were recognized, with at least eight sars-cov-2 orfs targeted. importantly, we detected sars-cov-2−reactive cd4+ t cells in ∼40-60% of unexposed individuals, suggesting cross-reactive t cell recognition between circulating ‘common cold’ coronaviruses and sars-cov-2. covid-19 is a world-wide emergency. the first cases occurred in december 2019 and now more than 240,000 deaths and 3,000,000 cases of sars-cov-2 infection have been reported globally as of may 1 st , (dong et al., 2020; wu and mcgoogan, 2020) . vaccines against sars-cov-2 are just beginning development (amanat and krammer, 2020; thanh le et al., 2020) . an understanding of human t cell responses to sars-cov2 is lacking, due to the rapid emergence of the pandemic. there is an urgent need for foundational information about t cell responses to this virus. the first steps for such an understanding are the ability to quantify the virus-specific cd4 + and cd8 + t cells. such knowledge is of immediate relevance, as it will provide insights into immunity and pathogenesis of sars-cov-2 infection, and the same knowledge will assist vaccine design and evaluation of candidate vaccines. estimations of immunity are also central to epidemiological model calibration of future social distancing pandemic control measures (kissler et al., 2020) . such projections are dramatically different depending on whether sars-cov-2 infection creates substantial immunity, and whether any crossreactive immunity exists between sars-cov-2 and circulating seasonal 'common cold' human coronaviruses. definition and assessment of human antigen-specific sars-cov2 t cell responses are best made with direct ex vivo t cell assays using broad-based epitope pools and assays capable of detecting t cells of any cytokine polarization. herein, we have completed such an assessment with blood samples from covid-19 patients. there is also great uncertainty about whether adaptive immune responses to sars-cov-2 are protective or pathogenic, or whether both scenarios can occur depending on timing, composition, or magnitude of the adaptive immune response. hypotheses range the full gamut (peeples, 2020) , based on available clinical data from severe acute respiratory disease syndrome (sars) or mers (alshukairi et al., 2018; wong et al., 2004; zhao et al., 2017) or animal model data with sars in mice (zhao et al., 2016; zhao et al., 2010; zhao et al., 2009) , sars in nhps (liu et al., 2019; takano et al., 2008) or fipv in cats (vennema et al., 1990) . protective immunity, immunopathogenesis, and vaccine development for covid-19 are each briefly discussed below, related to introducing the importance of defining t cell responses to sars-cov-2. based on data from sars patients in [2003] [2004] (caused by sars-cov, the most closely related human betacoronavirus to sars-cov-2), and based on the fact that most acute viral infections result in development of protective immunity (sallusto et al., 2010) , a likely possibility has been that substantial cd4 + t cell, cd8 + t cell, and neutralizing antibody responses develop to sars-cov-2 and all contribute to clearance of the acute infection; and, as a corollary, some of the t and b cells are retained long-term (i.e., multiple years) as immunological memory and protective immunity against sars-cov-2 infection (guo et al., 2020b; li et al., 2008) . however, a contrarian viewpoint is also legitimate. while most acute infections result in the development of protective immunity, available data for human coronaviruses suggest the possibility that substantive adaptive immune responses can fail to occur (choe et al., 2017; okba et al., 2019; zhao et al., 2017) and robust protective immunity can fail to develop (callow et al., 1990) . a failure to develop protective immunity could occur due to a t cell and/or antibody response of insufficient magnitude or durability, with the neutralizing antibody response being dependent on the cd4 + t cell response (crotty, 2019; zhao et al., 2016) . thus, there is urgent need to understand the magnitude and composition of the human cd4 + and cd8 + t cell responses to sars-cov-2. if natural infection with sars-cov-2 elicits potent cd4 + and cd8 + t cell responses commonly associated with protective antiviral immunity, covid-19 is a strong candidate for rapid vaccine development. immunopathogenesis in covid-19 is a serious concern (cao, 2020; peeples, 2020) . it is most likely that an early cd4 + and cd8 + t cell response against sars-cov-2 is protective, but an early response is difficult to generate because of efficient innate immune evasion mechanisms of sars-cov-2 in humans (blanco-melo et al., 2020) . immune evasion by sars-cov-2 is likely exacerbated by reduced myeloid cell antigen presenting cell (apc) function or availability in the elderly (zhao et al., 2011) . in such cases, it is conceivable that late t cell responses may instead amplify pathogenic inflammatory outcomes in the presence of sustained high viral loads in the lungs, by multiple hypothetical possible mechanisms (guo et al., 2020a; li et al., 2008; liu et al., 2019) . critical (icu) and fatal outcomes are associated with elevated levels of inflammatory cytokines and chemokines, including il-6 (giamarellosbourboulis et al., 2020; wong et al., 2004; zhou et al., 2020) vaccine development against acute viral infections classically focuses on vaccine-elicited recapitulation of the type of protective immune response elicited by natural infection. such foundational knowledge is currently missing for covid-19, including how the balance and the phenotypes of responding cells vary as a function of disease course and severity. such knowledge can guide selection of vaccine strategies most likely to elicit protective immunity against sars-cov-2. furthermore, knowledge of the t cell responses to covid-19 can guide selection of appropriate immunological endpoints for covid-19 candidate vaccine clinical trials, which are already starting. limited information is also available about which sars-cov-2 proteins are recognized by human t cell immune responses. in some infections, t cell responses are strongly biased towards certain viral proteins, and the targets can vary substantially between cd4 + and cd8 + t cells (moutaftsi et al., 2010; tian et al., 2019) . knowledge of sars-cov-2 proteins and epitopes recognized by human t cell responses is of immediate relevance, as it will allow for monitoring of covid-19 immune responses in laboratories worldwide. epitope knowledge will also assist candidate vaccine design and facilitate evaluation of vaccine candidate immunogenicity. almost all of the current covid-19 vaccine candidates are focused on the spike protein. a final key issue to consider in the study of sars-cov-2 immunity is whether some degree of cross-reactive coronavirus immunity exists in a fraction of the human population, and whether this might influence susceptibility to covid-19 disease. this issue is also relevant for vaccine development, as cross-reactive immunity could influence responsiveness to candidate vaccines (andrews et al., 2015) . in sum, the ability to measure and understand the human cd4 + and cd8 + t cell responses to sars-cov-2 infection is a major knowledge gap currently impeding covid-19 vaccine development, interpretation of covid-19 disease pathogenesis, and calibration of future social distancing pandemic control measures. we recently predicted sars-cov-2 t cell epitopes utilizing the immune epitope database and analysis resource (iedb) vita et al., 2019) . utilizing bioinformatic approaches, we identified specific peptides in sars-cov-2 with increased probability of being t cell targets (grifoni et al., 2020) . we previously developed the megapool (mp) approach to allow simultaneous testing of large numbers of epitopes. by this technique, numerous epitopes are solubilized, pooled and relyophilized to avoid cell toxicity problems . these mps have been used in human t cell studies of a number of indications, including allergies (hinz et al., 2016) , tuberculosis , tetanus, pertussis (bancroft et al., 2016; da silva antunes et al., 2017) and dengue virus, for both cd4 + and cd8 + t cell epitopes (grifoni et al., 2017; weiskopf et al., 2015) . here, we generated mps based on predicted sars-cov-2 epitopes. specifically, one mp corresponds to 221 predicted hla class ii cd4 + t cell epitopes (grifoni et al., 2020) covering all proteins in the viral genome, apart from the spike (s) antigen (cd4_r mp). the prediction strategy utilized is geared to capture ~50% of the total response (dhanda et al., 2018; paul et al., 2015) and was designed and validated to predict dominant epitopes independently of ethnicity and hla polymorphism. this approach takes advantage of the extensive cross-reactivity and repertoire overlap between different hla class ii loci and allelic variants to predict promiscuous epitopes, capable of binding many of the most common hla class ii prototypic specificities (greenbaum et al., 2011; o'sullivan et al., 1991; sidney et al., 2010a, b; southwood et al., 1998) . for the spike protein, to ensure that all t cell reactivity against this important antigen can be detected, we generated a separate mp covering the entire antigen with 253 15-mer peptides overlapping by 10-residues (mp_s, table s1). as stated above, the mp used to probe the non-spike regions is expected to capture ~50% of the total response. the use of overlapping peptides spanning entire orfs instead allows for a more complete characterization, but also requires more cells. this factor should be kept in mind in terms of comparison of the magnitude of the cd4 + t cell responses to those pools. in the case of cd8 epitopes, since the overlap between different hla class i allelic variants and loci is more limited to specific groups of alleles, or supertypes (sidney et al., 2008) , we targeted a set of the 12 most prominent hla class i a and b alleles, which together allow broad coverage (>85%) of the general population. two class i mps were synthesized based on epitope predictions for those 12 most common hla a and b alleles (grifoni et al., 2020) , which collectively encompass 628 predicted hla class i cd8 + t cell epitopes from the entire sars-cov-2 proteome (cd8 mp-a and mp-b). to test for the generation of sars-cov-2 cd4 + and cd8 + t cell responses following infection, we initially recruited 20 adult patients who had recovered from covid-19 disease ( table 1) . we also utilized pbmc and plasma samples from local healthy control donors collected in 2015-2018 (see star methods). blood samples were collected at 20-35 days post-symptoms onset from nonhospitalized covid-19 patients who were no longer symptomatic. sars-cov-2 infection was determined by swab test viral pcr during the acute phase of the infection. verification of sars-cov-2 exposure was attempted both by lateral flow serology and sars-cov-2 spike protein receptor binding domain (rbd) elisa (stadlbauer et al., 2020) , using plasma from the convalescence stage blood draw. most patients were confirmed positive by lateral flow ig tests ( table 1) . all patients were confirmed covid-19 cases by sars-cov-2 rbd elisa (fig. 1, s1 ). all cases were igg positive; anti-rbd igm and iga was also detected in the large majority of cases ( fig. 1, s1) . we defined a 21-color flow cytometry panel of mononuclear leukocyte lineage and phenotypic markers (table s2) to broadly assess the immunological cellular profile of recovered covid-19 patients (fig. 1, s2) . the frequency of cd3 + cells was slightly increased in recovered covid-19 patients relative to non-exposed controls, while no significant differences overall were observed in the frequencies of cd4 + or cd8 + t cells between the two groups. frequencies of cd19 + cells were somewhat decreased, while no differences were observed in the frequencies of cd3 -cd19cells or cd14 + cd16monocytes ( fig. 1, s2) . no evidence of general lymphopenia was observed in the convalescing patients, consistent with the literature. next, we utilized the sars-cov-2 mps to probe cd4 + and cd8 + t cell responses. we utilized t cell receptor− (tcr) dependent activation induced marker (aim) assays to identify and quantify sars-cov-2−specific cd4 + t cells in recovered covid-19 patients. initial definition and assessment of human antigen-specific sars-cov-2 t cell responses are best made with direct ex vivo t cell assays using broad-based epitope pools, such as mps, and assays capable of detecting t cells of unknown cytokine polarization and functional attributes. aim assays are cytokine-independent assays to identify antigen-specific cd4 + t cells reiss et al., 2017) . aim assays have been successfully used to identify virus-specific, vaccine-specific, or tuberculosisspecific cd4 + t cells in a range of studies (dan et al., 2019; dan et al., 2016; herati et al., 2017; morou et al., 2019) . we stimulated pbmcs from 10 covid-19 cases and 11 healthy controls (sars-cov-2 unexposed, collected in 2015-2018) with a spike mp (mp_s) and the class ii mp covering the remainder of the sars-cov-2 orfeome ("non-spike", mp cd4_r). a cmv mp was used as a positive control, while dmso was used as the negative control (fig. 2, s3) . sars-cov-2 spike-specific cd4 + t cell responses (ox40 + cd137 + ) were detected in 100% of covid-19 cases (p < 0.0001 vs. unexposed donors spike mp, fig. 2a -b. p = 0.002 vs. dmso control, fig. 2c) . cd4 + t cell responses to the remainder of the sars-cov-2 orfeome were also detected in 100% of covid-19 cases (p < 0.0079 vs. unexposed donors non-spike mp, fig. 2a -b. p = 0.002, non-spike vs. dmso control, fig. 2c ). the magnitude of the sars-cov-2−specific cd4 + t cell responses measured was similar to that of the cmv mp (fig. s3c) . the concordance between sars-cov-2−specific cd4 + t cell measurements in independent experiments was high (p < 0.0002, fig s3d) . to assess functionality and polarization of the sars-cov-2−specific cd4 + t cell response, we measured cytokines secreted in response to mp stimulation. the sars-cov-2−specific cd4 + t cells were functional, as the cells produced il-2 in response to non-spike and spike mps (fig. 2d) . polarization of the cells appeared to be a classical t h 1 type, as substantial ifnγ was produced (fig. 2e) , while little to no il-4, il-5, il-13, or il-17α was expressed ( fig. s3g-j) . thus, recovered covid-19 patients consistently generated a substantial cd4 + t cell response against sars-cov-2. similar conclusions were reached using stimulation index as the metric ( fig. s3e-f) . in terms of total cd4 + t cell response per donor ( fig. 2a) , on average ~50% of the detected response was directed against the spike protein, and ~50% was directed against the mp representing the remainder of the sars-cov-2 orfeome ( fig. 2a) . this is of significance, since the sars-cov-2 spike protein is a key component of the vast majority of candidate covid-19 vaccines under development. of note, given the nature of the mp_r peptide predictions, the actual cd4 + t cell response to be ascribed to non-spike orfs was likely to be higher, addressed in further experiments below. to measure sars-cov-2−specific cd8 + t cells in the recovered covid-19 patients, we utilized two complementary methodologies, aim assays and intracellular cytokine staining (ics). the two sars-cov-2 class i mps were used, cd8-a and cd8-b, with cmv mp and dmso serving as positive and negative controls, respectively (fig. 3, s4) . cd8 + t cell responses were detected by aim (cd69 + cd137 + ) in 70% of covid-19 cases (p < 0.0011 vs. unexposed donors "cd8 total", fig. 3a -b. p = 0.002, cd8-a or cd8-b vs. dmso control, fig. s4b ). mp cd8-a contains spike epitopes, among epitopes to other proteins. the magnitude of the sars-cov-2 reactive cd8 + t cell responses measured by aim was somewhat lower than the cmv mp (fig. s4c) . similar conclusions were reached using stimulation index ( fig. s3d-e) . independently, ics assays detected ifnγ + sars-cov-2−specific cd8 + t cells in the majority of covid-19 cases ( fig. 3c-d) . the majority of ifnγ + cells co-expressed granzyme b (fig. 3d-e) . a substantial fraction of the ifnγ + cells expressed tnf, but not il-10 (fig. 3d) . thus, the majority of recovered covid-19 patients generated a cd8 + t cell response against sars-cov-2. relationship between sars-cov-2− − − −specific cd4 + t cell responses and igg and iga titers most protective antibody responses are dependent on cd4 + t cell help. therefore, we assessed whether stronger sars-cov-2−specific cd4 + t cell responses were associated with higher antibody titers in covid-19 cases. given that spike is the primary target of sars neutralizing antibodies, we examined spike-specific cd4 + t cells. spike-specific cd4 + t cell responses correlated well with the magnitude of the anti-spike rbd igg titers (r = 0.81. p < 0.0001. fig. 4a . similar results were obtained using stimulation index, fig. s5a ). the non-spike sars-cov-2−specific cd4 + t cell response did not correlate as well with anti-spike rbd igg titers (fig. 4b, s5b ), consistent with a common requirement for intramolecular cd4 + t cell help . anti-spike iga titers also correlated with spikespecific cd4 + t cells (p < 0.0002, fig. s5 ). thus, covid-19 patients make anti-spike rbd antibody responses commensurate with the magnitude of their spike-specific cd4 + t cell response. we then assessed the relationship between the cd4 + and cd8 + t cell responses to sars-cov-2. sars-cov-2−specific cd4 + and cd8 + t cell responses were well correlated (r=0.62. p = 0.0025, fig. 4c and s5) . thus, antibody, cd4 + and cd8 + t cell responses to sars-cov-2 were generally well correlated. while spike− and non-spike−specific cd4 + t cell responses were detectable in all covid-19 cases, cells were also detected in unexposed individuals ( fig. 3a-b) . these responses were statistically significant for non-spike-specific cd4 + t cell reactivity (non-spike, p = 0.039. spike, p = 0.067. fig. 5a -b). non-spike−specific cd4 + t cell responses were above the limit of detection in 50% of donors based on si (fig. s3e) . all of the donors were recruited between 2015-2018, excluding any possibility of exposure to sars-cov-2. four human coronaviruses are known causes of seasonal 'common cold' upper respiratory tract infections: hcov-oc43, hcov-hku1, hcov-nl63, and hcov-229e. we tested the sars-cov-2 unexposed donors for seroreactivity to hcov-oc43 and hcov-nl63 as a representative betacoronavirus and alphacoronavirus, respectively. all donors were igg seropositive to hcov-oc43 and hcov-nl63 rbd, to varying degrees ( fig. 5c) , consistent with the endemic nature of these viruses (gorse et al., 2010; huang et al., 2020; severance et al., 2008) . we therefore examined whether these represented true pan-coronavirus t cells capable of recognizing sars-cov-2 epitopes. a most pressing, yet unresolved, set of issues in understanding sars-cov-2 immune responses is what antigens are targeted by cd4 + and cd8 + t cells, whether the corresponding antigens are the same or different, and how do they reflect the antigens currently considered for covid-19 vaccine development. we synthesized sets of overlapping peptides spanning the entire sequence of sars-cov-2, and pooled them separately so that each pool would represent one antigen (with the exception of nsp3, for which two pools were made. table s1 ). in the case of cd4 + t cell responses, no obvious pattern of antigen specificity was observed based on sars-cov-2 genome organization; however, coronaviruses increase protein synthesis of certain orfs in infected cells via subgenomic rnas. accounting for the relative abundance of subgenomic rnas (fig. 6a ) (irigoyen et al., 2016; snijder et al., 2003; xie et al., 2020) , the orfs were re-ordered based on predicted protein abundance (fig. 6b) . a clear hierarchy of sars-cov-2−specific cd4 + t cell targets was then apparent, with the majority of the cd4 + t cell response in covid-19 cases directed against highly expressed sars-cov-2 orfs spike, m, and n. on average, these antigens accounted for 27%, 21% and 11% of the total cd4 + t cell response, respectively. most covid-19 cases also had cd4 + t cells specific for sars-cov-2 nsp3, nsp4 and orf8 (fig. 6b) , on average each accounting for ~5% of the total cd4 + t cell response (fig. 6c) . e, orf6, hypothetical orf10, and nsp1 are all small antigens (or potentially not expressed, in the case of orf10) and were most likely predominantly unrecognized as a result. these results are somewhat unexpected, because data for other coronaviruses, from 27 different studies curated in the iedb, reported that spike accounted for nearly two-thirds of reported cd4 + t cell reactivity (table s3) . n accounted for most of the remaining epitopes in the published literature, although human n-specific cd4 + t cell responses were not observed in one of the most comprehensive studies of human sars-cov-1 t cell responses (li et al., 2008) . coronavirus m has not previously been described as a prominent target of cd4 + t cell responses (table s3 ). in sum, these results, fully scanning the sars2 orfeome, demonstrate a pattern of robust and diverse sars-cov-2−specific cd4 + t cell reactivity in convalescing covid-19 cases that correlated largely with predicted viral protein abundance in infected cells. when examining the non-exposed donors, the pattern of cd4 + t cell targets changed. while s was still a relatively prominent target (23% of total, on average), there was no, or marginal, reactivity against sars-cov-2 n and m. among donors with detectable cd4 + t cells, a shift in reactivity was observed towards sars-cov-2 nsp14 (25%), nsp4 (15%) and nsp6 (14%) (fig 6b-c) . sars-cov-2−reactive cd4 + t cells were detected in at least six different unexposed donors, demonstrating that the crossreactivity is relatively widely distributed (fig. s6a) . having scanned the full sars-cov-2 orfeome for cd4 + t cell reactivity in multiple donors, it was possible to assess whether the epitope prediction mp approach successfully enriched for sars-cov-2 epitopes targeted by human cd4 + t cells. when the total reactivity observed with the cd4_r mp was plotted versus the sum total of all antigen pools (excluding spike, given that spike predictions were not included in the cd4_r mp), a significant correlation was observed (p < 0.0002, fig s6c) . the single mp-r captured ~50% (44% +/-range 28-80%) of the non-spike response per covid-19 donor, demonstrating the success of the prediction approach, which, as mentioned above, was devised to attempt to capture approximately 50% of the total response (dhanda et al., 2018; paul et al., 2015) . in the case of cd8 + t cell responses, the data in the literature from other coronaviruses (57 different studies curated in the iedb, table s3 ) reported spike accounting for 50% and n accounting for 36% of the defined epitopes. in a large study of human sars-cov-1 responses, spike was reported as essentially the only target of cd8 + t cell responses (li et al., 2008) , while in a study of mers cd8 + t cells, responses were noted for spike, n and a pool of m/e peptides (zhao et al., 2017) . few epitopes have been reported from other coronavirus antigens (table s3) . here, we scanned the full sars-cov-2 orfeome for cd8 + t cell recognition. our data indicate a somewhat different pattern of immunodominance for sars-cov-2 cd8 + t cell reactivity ( fig. 6d-e) , with spike protein accounting for ~26% of the reactivity, and n accounting for ~12%. significant reactivity in covid-19 recovered subjects was derived from other antigens, such as m (22%), nsp6 (15%), orf8 (10%) and orf3a (7%) (fig. 6d-e) . in unexposed donors, sars-cov-2−reactive cd8 + t cells were detected in at least four different donors (fig. s7) , with less clear targeting of specific sars-cov-2 proteins than was observed for cd4 + t cells, suggesting that coronavirus cd8 + t cell crossreactivity exists but is less widespread than cd4 + t cell crossreactivity. there is a critical need for foundational knowledge about t cell responses to sars-cov-2. here we report functional validation of predicted epitopes when arranged in epitope "megapools", utilizing pbmcs derived from convalescing covid-19 cases. the experiments also used protein-specific peptide pools to determine which sars-cov-2 proteins are the predominant targets of human sars-cov-2−specific cd4 + and cd8 + t cells generated during covid-19 disease. importantly, we utilized the exact same series of experimental techniques with blood samples from healthy control donors (pbmcs collected in the 2015-2018 time frame), and substantial crossreactive coronavirus t cell memory was observed. our results demonstrate that the epitope mps are reagents well suited to analyze and detect sars-cov-2-specific t cell responses with limited sample material. we also developed and tested peptide pools corresponding to each of the 25 proteins encoded in the sars-cov-2 genome. data from both the epitope mps and protein peptide pool experiments can be interpreted in the context of previously reported t cell response immunodominance patterns observed for other coronaviruses, particularly the sars and mers viruses, which have been studied in humans, hla-transgenic mice, wild-type mice and other species. in the case of cd4 + t cell responses, data for other coronaviruses found that spike accounted for nearly two-thirds of reported cd4 + t cell reactivity, with n and m accounting for limited reactivity, and no reactivity in one large study of human sars-cov-1 responses (li et al., 2008) . our sars-cov-2 data reveal that the pattern of immunodominance in covid-19 is different. in particular, m, spike and n proteins were clearly co-dominant, each recognized by 100% of covid-19 cases studied here. significant cd4 + t cell responses were also directed against nsp3, nsp4, orf3s, orf7a, nsp12 and orf8. these data suggest that a candidate covid-19 vaccine consisting only of sars-cov-2 spike would be capable of eliciting sars-cov-2−specific cd4 + t cell responses of similar representation to that of natural covid-19 disease, but the data also indicate that there are many potential cd4 + t cell targets in sars-cov-2 and inclusion of additional sars-cov-2 structural antigens such as m and n would better mimic the natural sars-cov-2−specific cd4 + t cell response observed in mild to moderate covid-19 disease. regarding sars-cov-2 cd8 + t cell responses, the pattern of immunodominance found here differed from the literature for other coronaviruses. however, stringent comparisons are not possible, as some earlier studies were not similarly comprehensive and did not utilize the same experimental strategy. the spike protein was a target of human sars-cov-2 cd8 + t cell responses, but it is not dominant. sars-cov-2 m was just as strongly recognized, and significant reactivity was noted for other antigens, mostly nsp6, orf3a, and n, which comprised nearly 50% of the total cd8 + t cell response, on average. thus, these data indicate that candidate covid-19 vaccines endeavoring to elicit cd8 + t cell responses against the spike protein will be eliciting a relatively narrow cd8 + t cell response compared to the natural cd8 + t cell response observed in mild to moderate covid-19 disease. an optimal vaccine cd8 + t cell response to sars-cov-2 might benefit from additional class i epitopes, such as the ones derived from the m, nsp6, orf3a and/or n. there have been concerns regarding vaccine enhancement of disease by certain candidate covid-19 vaccine approaches, via antibody-dependent enhancement (ade) or development of a t h 2 responses (peeples, 2020) . herein, we saw predominant t h 1 responses in convalescing covid-19 cases, with little to no t h 2 cytokines. clearly more studies are required, but the data here appear to predominantly represent a classical t h 1 response to sars-cov-2. while it was important to identify antigen-specific t cell responses in covid-19 cases, it is also of great interest to understand whether crossreactive immunity exists between coronaviruses to any degree. a key step in developing that understanding is to examine antigen-specific cd4 + and cd8 + t cells in covid-19 cases and in unexposed healthy controls, utilizing the exact same antigens and series of experimental techniques. cd4 + t cell responses were detected in 40-60% of unexposed individuals. this may be reflective of some degree of crossreactive, preexisting immunity to sars-cov-2 in some, but not all, individuals. whether this immunity is relevant in influencing clinical outcomes is unknown-and cannot be known without t cell measurements before and after sars-cov-2 infection of individuals-but it is tempting to speculate that the crossreactive cd4 + t cells may be of value in protective immunity, based on sars mouse models (zhao et al., 2016) . clear identification of the crossreactive peptides, and their sequence homology relation to other coronaviruses, requires deconvolution of the positive peptide pools, which is not feasible with the cell numbers presently available, and time frame of the present study. regarding the value of crossreactive t cells, influenza (flu) immunology in relationship to pandemics may be instructive. in the context of the 2009 h1n1 influenza pandemic, preexisting t cell immunity existed in the adult population, which focused on the more conserved internal influenza viral proteins (greenbaum et al., 2009) . the presence of crossreactive t cells was found to correlate with less severe disease (sridhar et al., 2013; wilkinson et al., 2012) . the frequent availability of crossreactive memory t cell responses might have been one factor contributing to the lesser severity of the h1n1 flu pandemic (hancock et al., 2009) . cross-reactive immunity to influenza strains has been modeled to be a critical influencer of susceptibility to newly emerging, potentially pandemic, influenza strains (gostic et al., 2016) . given the severity of the ongoing covid-19 pandemic, it has been modeled that any degree of crossprotective coronavirus immunity in the population could have a very substantial impact on the overall course of the pandemic, and the dynamics of the epidemiology for years to come (kissler et al., 2020) . caveats of this study include the sample size and the focus on non-hospitalized covid-19 cases. sample size was limited by expediency. the focus on non-hospitalized cases of covid-19 is a strength, in that these donors had uncomplicated disease of moderate duration, and thus it was encouraging that substantial cd4 + t cell and antibody responses were detected in all cases, and cd8 + t cell responses in the majority of cases. complementing these data with mp t cell data from acute patients and patients with complicated disease course will also be of clear value, as will studies on the longevity of sars-cov-2 immunological memory. additionally, lack of detailed information on common cold history or matched blood samples pre-exposure to sars-cov-2 prevents conclusions regarding the abundance of crossreactive coronavirus t cells before exposure to sars-cov-2 and any potential protective efficacy of such cells. finally, full epitope mapping in the future will add important detailed resolution of the human coronavirus-specific t cell responses. in sum, we measured sars-cov-2−specific cd4 + and cd8 + t cells responses in covid-19 cases. using multiple experimental approaches, sars-cov-2−specific cd4 + t cell and antibody responses were observed in all covid-19 cases, and cd8 + t cell responses were observed in most. importantly, pre-existing sars-cov-2−crossreactive t cell responses were observed in healthy donors, indicating some potential for pre-existing immunity in the human population. orf mapping of t cell specificities revealed valuable targets for incorporation in candidate vaccine development, and revealed distinct specificity patterns between covid-19 cases and unexposed healthy controls. figure s4 . total mp responses per donor were used in each case ("non-spike" + "spike" (cd4_r + mp_s) for cd4 + t cells, cd8_a + cd8_b for cd8 + t cells). statistical comparisons were performed using spearman correlation. see also figure s5 . (b) sars-cov-2 antigen-specific cd4 + t cells (aim + , ox40 + cd137 + ) quantified by stimulation index, using a peptide pool for each viral protein (with two exceptions, see table s1 ). covid-19 cases (top, in blue. n=10) and unexposed donors (bottom, in white. n=10). (d) sars-cov-2 antigen-specific cd4 + t cells (aim + , ox40 + cd137 + ) quantified by stimulation index, using a peptide pool for each viral protein (with two exceptions, see table s1 ). covid-19 cases (top, in red. n=10) and unexposed donors (bottom, in grey. n=10). table s6 . representative gating of cd3 + t cells, cd19 + b cells, cd3 -cd19cells, cd4 + t cells, cd8 + t cells and cd14 + monocytes from donor pbmcs is shown. briefly, mononuclear cells were gated out of all events followed by subsequent singlet gating. live cells are gated as zombie uv -. cells were then gated as cd19-pe-cy5 + , cd3-buv395 + or cd19 -cd3cells. t cells were further subdivided into either cd8-buv805 + or cd4-percpefluor710 + populations. cd3 -cd19cells were defined as cd56-pe-dazzle bright nk cells, cd56 dim cd-16buv737 + nk cells or cd56monocytes. monocytes were further classified on differential expression of cd14-bv510 and cd16. (c) cmv-specific cd8 + t cells as percentage of aim + (cd69 + cd137 + ) cd8 + t cells after stimulation of pbmcs with cmv peptide pool. data were background subtracted against dmso negative control and are shown with geometric mean and geometric standard deviation. samples were from unexposed donors ("unexposed", n=11) and recovered covid-19 patients ("covid-19", n=10). (d-e) stimulation index quantitation of aim + (cd69 + cd137 + ) cd8 + t cells; the same samples as in figure 2 and fig s4c were analyzed. statistical comparisons across cohorts were performed with the mann-whitney test, while paired sample comparisons were performed with the wilcoxon test. **p<0.01; ***p<0.001. ns not significant. (e) correlation between sars-cov-2−specific cd4 + t cells and sars-cov-2−specific cd8 + t cells, using stimulation index. total mp responses per donor were used in each case ("non-spike" + "spike" (cd4_r + mp_s) for cd4 + t cells, cd8-a + cd8-b for cd8 + t cells). statistical comparisons were performed using spearman correlation. figure 6 . (a) the same data as figure 6b , but with each unexposed donor color coded. (b) the same experiment as figure 6b , but with sars-cov-2−specific cd4 + t cells measured as percentage of aim + (ox40 + cd137 + ) cd4 + t cells, after background subtraction. covid-19 cases (top, in blue. n=10) and unexposed donors (bottom, in white. n=10). (c) correlation of sars-cov-2−specific cd4 + t cells detected using the epitope prediction approach (cd4_r mp) compared against the sum total of all antigen pools of overlapping peptides (excluding spike), run with samples from the same donors in two different experiment series. dotted line indicates 1:1 concordance. statistical comparison was performed using spearman correlation. figure 6 . (a) the same data as figure 6d , but with each unexposed donor color coded. (b) the same experiment as figure 6d , but with sars-cov-2−specific cd8 + t cells measured as percentage of aim + (cd69 + cd137 + ) cd8 + t cells, after background subtraction. covid-19 cases (top, in red. n=10) and unexposed donors (bottom, in grey. n=10). further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, dr. alessandro sette (alex@lji.org). aliquots of synthesized sets of peptides utilized in this study will be made available upon request. there are restrictions to the availability of the peptide reagents due to cost and limited quantity. the published article includes all data generated or analyzed during this study, and summarized in the accompanying tables, figures and supplemental materials. healthy unexposed donors. samples from healthy adult donors were obtained by the la jolla institute for immunology (lji) clinical core or provided by a commercial vendor (carter blood care) for prior, unrelated studies between early 2015 and early 2018. these samples were considered to be from unexposed controls, given that sars-cov-2 emerged as a novel pathogen in late 2019, more than one year after the collection of any of these samples. these donors were considered healthy in that they had no known history of any significant systemic diseases, including, but not limited to, autoimmune disease, diabetes, kidney or liver disease, congestive heart failure, malignancy, coagulopathy, hepatitis b or c, or hiv. an overview of the characteristics of these unexposed donors is provided in table 1 . the lji institutional review board approved the collection of these samples (lji; vd-112). at the time of enrollment in the initial studies, all individual donors provided informed consent that their samples could be used for future studies, including this study. convalescent covid-19 donors. the institutional review boards of the university of california, san diego (ucsd; 200236x) and la jolla institute (lji; vd-214) approved blood draw protocols for convalescent donors. all human subjects were assessed for capacity using a standardized and approved assessment. subjects deemed to have capacity voluntarily gave informed consent prior to being enrolled in the study. individuals did not receive compensation for their participation. study inclusion criteria included subjects over the age of 18 years, regardless of disease severity, race, ethnicity, gender, pregnancy or nursing status, who were willing and able to provide informed consent, or with a legal guardian or representative willing and able to provide informed consent when the participant could not personally do so. study exclusion criteria included lack of willingness or ability to provide informed consent, or lack of an appropriate legal guardian or representative to provide informed consent. blood from convalescent donors was obtained at a uc san diego health clinic. blood was collected in acid citrate dextrose (acd) tubes and stored at room temperature prior to processing for pbmc isolation and plasma collection. a separate serum separator tube (sst) was collected from each donor. samples were de-identified prior to analysis. other efforts to maintain the confidentiality of participants included referring to specimens and other records via an assigned, coded identification number. prior to enrollment in the study, donors were asked to provide proof of positive testing for sars-cov-2, and screened for clinical history and/or epidemiological risk factors consistent with the world health organization (who) or centers for disease control and prevention (cdc) case definitions of covid-19 or persons under investigation (pui) (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/surveillanceand-case-definitions, https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-criteria.html). per cdc and who guidance, clinical features consistent with covid-19 included subjective or measured fever, signs or symptoms of lower respiratory tract illness (e.g. cough or dyspnea). epidemiologic risk factors included close contact with a laboratory-confirmed case of sars-cov-2 within 14 days of symptom onset or a history of travel to an area with a high rate of covid-19 cases within 14 days of symptom onset. disease severity was defined as mild, moderate, severe or critical based on a modified version of the who interim guidance, "clinical management of severe acute respiratory infection when covid-19 is suspected" (who reference number: who/2019-ncov/clinical/2020.4). mild disease was defined as an uncomplicated upper respiratory tract infection (uri) with potential non-specific symptoms (e.g. fatigue, fever, cough with or without sputum production, anorexia, malaise, myalgia, sore throat, dyspnea, nasal congestion, headache; rarely diarrhea, nausea and vomiting) that did not require hospitalization. moderate disease was defined as the presence of lower respiratory tract disease or pneumonia without the need for supplemental oxygen, without signs of severe pneumonia, or a uri requiring hospitalization (including observation admission status). severe disease was defined as severe lower respiratory tract infection or pneumonia with fever plus any one of the following: tachypnea (respiratory rate > 30 breaths per minute), respiratory distress, or oxygen saturation less than 93% on room air. critical disease was defined as the need for icu admission or the presence of acute respiratory distress syndrome (ards), sepsis, or septic shock, as defined in the who guidance document. convalescent donors were screened for symptoms prior to scheduling blood draws, and had to be symptom-free and approximately 3 weeks out from symptom onset at the time of the initial blood draw. following enrollment, whole blood from convalescent donors was run on a colloidal-gold immunochromatographic 'lateral flow' assay to evaluate for prior exposure to sars-cov-2. this assay detects igm or igg antibodies directed against recombinant sars-cov-2 antigen labeled with a colloidal gold tracer (20/20 bioresponse coronacheck). ninety percent of convalescent donors tested positive for igm or igg to sars-cov-2 by this assay (table 1) . convalescent donors were california residents, who were either referred to the study by a health care provider or self-referred. the majority (75%) of donors had a known sick contact with covid-19 or suspected exposure to sars-cov-2 ( table 1) . the most common symptoms reported were cough, fatigue, fever, anosmia, and dyspnea. seventy percent of donors experienced mild illness. donors were asked to self-report any known medical illnesses. of note, 65% of these individuals had no known underlying medical illnesses. epitope megapool (mp) design and preparation. sars-cov-2 virus-specific cd4 and cd8 peptides were synthesized as crude material (a&a, san diego, ca), resuspended in dmso, pooled and sequentially lyophilized as previously reported . sars-cov-2 epitopes were predicted using the protein sequences derived from the sars-cov-2 reference (genbank: mn908947) and iedb analysis-resource as previously described grifoni et al., 2020) . specifically, cd4 sars-cov-2 epitope prediction was carried out using a previously described approach in tepitool resource in iedb paul et al., 2016) , to select peptides with median consensus percentile ≤ 20, similar to what was previously described, but removing the resulting spike glycoprotein epitopes from this prediction (cd4-r (remainder) "non-spike" mp, n=221). this approach takes advantage of the extensive cross-reactivity and repertoire overlap between different hla class ii loci and allelic variants to predict promiscuous epitopes, capable of binding across the most common hla class ii prototypic specificities (greenbaum et al., 2011; o'sullivan et al., 1991; sidney et al., 2010a, b; southwood et al., 1998) . the algorithm utilizes predictions for seven common hla-dr alleles (drb1*03:01, drb1*07:01, drb1*15:01, drb3*01:01, drb3*02:02, drb4*01:01 and drb5*01:01) empirically determined to allow coverage of diverse populations and for different pathogens and antigen systems (dhanda et al., 2018; paul et al., 2015) . to investigate in-depth spike-specific cd4 t cells, 15-mer peptides (overlapping by 10 amino acids) spanning the entire antigen have been synthesized and pooled separately (cd-4 s (spike) mp, n=253). in the case of cd8 epitopes, since the overlap between different hla class i allelic variants and loci is more limited to specific groups of alleles, or supertypes (sidney et al., 2008) , we targeted a set of the 12 most prominent hla class i a and b alleles (a*01:01, a*02:01, a*03:01, a*11:01, a*23:01, a*24:02, b*07:02, b*08:01, b*35:01, b*40:01, b*44:02, b*44:03), which have been shown to allow broad coverage of the general population. cd8 sars-cov-2 epitope prediction was performed as previously reported, using netmhc pan el 4.0 algorithm (jurtz et al., 2017) for the top 12 more frequent hla alleles and selecting the top 1 percentile predicted epitope per hla allele clustered with nested/overlap reduction (grifoni et al., 2020) . the 628 predicted cd8 epitopes were split in two cd8 mps containing 314 peptides each (cd8-a and cd8-b). the cmv mp is a pool of previously reported class i and class ii epitopes . protein peptide pools. in the case of the protein pools, peptides of 15 amino acid length overlapping by 10 spanning each entire protein sequence were tested in a single mp (6-253 peptides per pool). table s1 lists the number of peptides pooled for each of the viral proteins. upon request we are prepared to make these mp available to the scientific community for use in a diverse set of investigations. for all samples whole blood was collected in acd tubes (covid-19 donors) or heparin coated blood bag (healthy unexposed donors). whole blood was then centrifuged for 15 min at 1850 rpm to separate the cellular fraction and plasma. the plasma was then carefully removed from the cell pellet and stored at -20c. peripheral blood mononuclear cells (pbmc) were isolated by density-gradient sedimentation using ficoll-paque (lymphoprep, nycomed pharma, oslo, norway) as previously described (weiskopf et al., 2013) . isolated pbmc were cryopreserved in cell recovery media containing 10% dmso (gibco), supplemented with 10% heat inactivated fetal bovine serum, depending on the processing laboratory, (fbs; hyclone laboratories, logan ut) and stored in liquid nitrogen until used in the assays. sars-cov-2 receptor binding domain (rbd) protein was obtained courtesy of florian krammer and peter kim (stadlbauer et al., 2020) . corning 96-well half-area plates (thermofisher 3690) were coated with 1µg/ml sars-cov-2 rbd overnight at 4°c. elisa protocol generally followed that of the krammer lab, which previously demonstrated specificity (stadlbauer et al., 2020) . plates were blocked the next day with 3% milk (skim milk powder thermofisher lp0031 by weight/volume) in phosphate buffered saline (pbs) containing 0.05% tween-20 (thermoscientific j260605-ap) for 2 hours at room temperature. plasma was then added to the plates and incubated for 1.5 hours at room temperature. prior to plasma addition to the plates, plasma was heat inactivated at 56°c for 30-60 minutes. plasma was diluted in 1% milk in 0.05% pbs-tween 20 starting at a 1:3 dilution and diluting each sample at by 1:3. plates were then washed 5 times with 0.05% pbs-tween 20. secondary antibodies were diluted in 1% milk in 0.05% tween-20 and incubated for 1 hour. for igg, anti-human igg peroxidase antibody produced in goat (sigma a6029) was used at a 1:5000 dilution. for igm, anti-human igm peroxidase antibody produced in goat (sigma a6907) was used at a 1:10,000 dilution. for iga, anti-human iga horseradish peroxidase antibody (hybridoma reagent laboratory hp6123-hrp) was used at a 1:1,000 dilution. plates were washed 5 times with 0.05% pbs-tween 20. plates were developed with tmb substrate kit (thermoscientific 34021) for 15 minutes at room temperature. the reaction was stopped with 2m sulfuric acid. plates were read on a spectramax plate reader at 450 nm using softmax pro, and ods were background subtracted. a positive control standard was created by pooling plasma from six convalescing covid-19 patients. positive control standard was run on each plate and was used to calculate titers (relative units) for all samples using non-linear regression interpolations, done to quantify the amount of anti-rbd igg, anti-rbd igm, and anti-rbd iga present in each specimen. titers were plotted for each specimen and compared to covid-19 negative specimens. as a second analytical approach, area under the curve was also calculated for each specimen to compare covid-19 to negative specimens, using a baseline of 0.05 for peak calculations. an in-house elisa at unc was performed by coating with recombinant s rbd antigens (sars-cov-2, sars-cov, oc43-cov and nl63-cov) in tbs for 1 h at 37 c. after blocking, we added 1:20 diluted serum and incubated at 37°c for 1 h. antigen-specific antibodies (ig) were measured at 405 nm by using alkaline phosphatase conjugated goat anti-human igg, iga and igm abs and 4-nitrophenyl phosphate. direct ex vivo pbmc immune cell phenotyping. for the surface stain, 1x10 6 pbmcs were resuspended in 100 µl pbs with 2% fbs (facs buffer) and stained with antibody cocktail for 1 hour at 4°c in the dark. following surface staining, cells were washed twice with facs buffer. cells were then fixed/permeabilized for 40min at 4c in the dark using the ebioscience foxp3 transcription factor buffer kit (thermofisher scientific, waltham, ma). following fixation/permeabilization, cells were washed twice with 1x permeabilization buffer, resuspended in 100 µl permeabilization buffer and stained with intracellular/intranuclear antibodies for 1 hour at 4ºc in the dark. samples were washed twice with 1x permeabilization buffer following staining. after the final wash, cells were resuspended in 200µl facs buffer. all samples were acquired on a bd facsymphony cell sorter (bd biosciences, san diego, ca). a list of antibodies used in this panel can be found in table s2 . t cell stimulations. for all flow cytometry assays of stimulated t cells, cryopreserved cells were thawed by diluting them in 10 ml complete rpmi 1640 with 5% human ab serum (gemini bioproducts) in the presence of benzonase [20ul/10ml]. all samples were acquired on a ze5 cell analyzer (bio-rad laboratories), and analyzed with flowjo software (tree star, san carlos, ca). activation induced cell marker assay. cells were cultured for 24 hours in the presence of sars-cov-2 specific mps [1 µg/ml] or 10 µg/ml pha in 96-wells u bottom plates at 1x10 6 pbmc per well. a stimulation with an equimolar amount of dmso was performed as negative control, phytohemagglutinin (pha, roche, 1µg/ml) and stimulation with a combined cd4 and cd8 cytomegalovirus mp (cmv, 1µg/ml) were included as positive controls. supernatants were harvested at 24 hours post-stimulation for multiplex detection of cytokines. antibodies used in the aim assay are listed in table s4 . aim assays shown in figure 2 and 3 and aim assays shown in figure 6 had five covid-19 donors in common and nine unexposed donors. full raw data is listed in table s6 . intracellular cytokine staining assay. for the intracellular cytokine staining, pbmc were cultured in the presence of sars-cov-2 specific mps [1 µg/ml] for 9 hours. golgi-plug containing brefeldin a (bd biosciences, san diego, ca) and monensin (biolegend, san diego, ca) were added 3 hours into the culture. cells were then washed and surface stained for 30 minutes on ice, fixed with 1% of paraformaldehyde (sigma-aldrich, st. louis, mo) and kept at 4 o c overnight. antibodies used in the ics assay are listed in table s5 . the gates applied for the identification of ifnγ, gzb, tnfα, or il-10 production on the total population of cd8 + t cells were defined according to the cells cultured with dmso for each individual. supernatants were collected from 24-hour stimulation cultures of the aim assays and stored in 96 well plates at -20 °c. cytokines in cell culture supernatants of the same samples used for aim were quantified using a human th cytokine panel (13-plex) kit (legendplex, biolegend) according to the manufacturer's instruction. supernatants were mixed with beads coated with capture antibodies specific for il-5, il-13, il-2, il-6, il-9, il-10, ifnγ, tnfα, il-17a, il-17f, il-4, il-21 and il-22 and incubated on a 96 well filter plate for 2 hours. beads were washed and incubated with biotin-labelled detection antibodies for 1 hour, followed by a final incubation with streptavidin-pe. beads were analyzed by flow cytometry using a facs canto cytometer. analysis was performed using the legendplex analysis software v8.0, which distinguishes between the 13 different analytes on basis of bead size and internal dye. to identify coronavirus epitopes and associated references, the iedb was searched (on april 16, 2020) utilizing the following queries. a first query was run to identify references associated with class i restricted cd8 epitopes, which utilized the criteria settings "antigen": organism = coronavirus (taxonomy id 11118); "assay": positive assays only; "assay": t cell assay; "mhc restriction" = mhc class ii; no parameters were defined for "host" or "disease." this query identified 57 references, which are listed and displayed under the "references" tab on the results page. a second query was run to identify references associated with class ii restricted cd4 epitopes which utilized the criteria settings "antigen": organism = coronavirus (taxonomy id 11118); "assay": positive assays only; "assay": t cell assay; "mhc restriction" = mhc class ii; no parameters were defined for "host" or "disease." this query identified 27 references, which are listed and displayed under the "references" tab on the results page. a third query was run to specifically capture epitopes and map them back to the antigen of origin using the setting; "antigen": organism = coronavirus (taxonomy id 11118); "assay": positive assays only; "assay": t cell assay; no parameters were defined for "mhc restriction," "host" or "disease." results were exported as csv files, and then examined in excel to tabulate the number of cd4 and cd8 epitopes recognized in humans, mice, transgenic mice and other hosts associated with each respective antigen. data and statistical analyses were done in flowjo 10 and graphpad prism 8.4, unless otherwise stated. the statistical details of the experiments are provided in the respective figure legends. data plotted in linear scale were expressed as mean + standard deviation (sd). data plotted in logarithmic scales were expressed as geometric mean + geometric standard deviation (sd). correlation analyses were performed using spearman, while mann-whitney or wilcoxon tests were applied for unpaired or paired comparisons, respectively. details pertaining to significance are also noted in the respective legends. t cell data have been calculated as background subtracted data or stimulation index. background subtracted data were derived by subtracting the percentage of aim + cells after sars-cov-2 stimulation from the dmso stimulation. stimulation index was calculated instead by dividing the percentage of aim+ cells after sars-cov-2 stimulation with the percentage of aim + cells derived from dmso stimulation. if the aim + cells percentage after dmso stimulation was equal to 0, the minimum value across each cohort was used. when two stimuli were combined together, the percentage of aim + cells after sars-cov-2 stimulation was combined and either subtracted twice or divided by twice the value of the percentage of aim+ cells derived from dmso stimulation. additional data analysis techniques are described in the methods sections above. high prevalence of mers-cov infection in camel workers in saudi arabia sars-cov-2 vaccines: status report immune history profoundly affects broadly protective b cell responses to influenza th1 versus th2 t cell polarization by whole-cell and acellular childhood pertussis vaccines persists upon re-immunization in adolescence and adulthood sars-cov-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems the time course of the immune response to experimental coronavirus infection of man covid-19: immunopathology and its implications for therapy automatic generation of validated specific epitope sets mers-cov antibody responses 1 year after symptom onset t follicular helper cell biology: a decade of discovery and diseases definition of human epitopes recognized in tetanus toxoid and development of an assay strategy to detect ex vivo tetanus cd4+ t cell responses recurrent group a streptococcus tonsillitis is an immunosusceptibility disease involving antibody deficiency and aberrant tfh cells a cytokine-independent approach to identify antigen-specific human germinal center t follicular helper cells and rare antigen-specific cd4+ t cells in blood predicting hla cd4 immunogenicity in human populations. front immunol 9 iedb-ar: immune epitope database-analysis resource in 2019 an interactive web-based dashboard to track covid-19 in real time complex immune dysregulation in covid-19 patients with severe respiratory failure prevalence of antibodies to four human coronaviruses is lower in nasal secretions than in serum potent protection against h5n1 and h7n9 influenza via childhood hemagglutinin imprinting functional classification of class ii human leukocyte antigen (hla) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes pre-existing immunity against swine-origin h1n1 influenza viruses in the general human population global assessment of dengue virus-specific cd4+ t cell responses in dengue-endemic areas a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov-2 cardiovascular implications of fatal outcomes of patients with coronavirus disease 2019 (covid-19) long-term persistence of igg antibodies in sars-cov infected healthcare workers. medrxiv cross-reactive antibody responses to the 2009 pandemic h1n1 influenza virus cytokine-independent detection of antigen-specific germinal center t follicular helper cells in immunized nonhuman primates using a live cell activation-induced marker technique successive annual influenza vaccination induces a recurrent oligoclonotypic memory response in circulating t follicular helper cells lack of allergy to timothy grass pollen is not a passive phenomenon but associated with the allergen-specific modulation of immune reactivity a systematic review of antibody mediated immunity to coronaviruses: antibody kinetics, correlates of protection, and association of antibody responses with severity of disease. medrxiv high-resolution analysis of coronavirus gene expression by rna sequencing and ribosome profiling netmhcpan-4.0: improved peptide-mhc class i interaction predictions integrating eluted ligand and peptide binding affinity data projecting the transmission dynamics of sars-cov-2 through the postpandemic period t cell responses to whole sars coronavirus in humans a quantitative analysis of complexity of human pathogen-specific cd4 t cell responses in healthy m. tuberculosis infected south africans antispike igg causes severe acute lung injury by skewing macrophage responses during acute sars-cov infection altered differentiation is central to hiv-specific cd4(+) t cell dysfunction in progressive disease uncovering the interplay between cd8, cd4 and antibody responses to complex pathogens on the interaction of promiscuous antigenic peptides with different dr alleles. identification of common structural motifs sensitive and specific detection of low-level antibody responses in mild middle east respiratory syndrome coronavirus infections development and validation of a broad scheme for prediction of hla class ii restricted t cell epitopes tepitool: a pipeline for computational prediction of t cell epitope candidates news feature: avoiding pitfalls in the pursuit of a covid-19 vaccine comparative analysis of activation induced marker (aim) assays for sensitive identification of antigen-specific cd4 t cells from vaccines to memory and back selective cd4+ t cell help for antibody responses to a large viral pathogen: deterministic linkage of specificities development of a nucleocapsid-based human coronavirus immunoassay and estimates of individuals exposed to coronavirus in a u.s. metropolitan population hla class i supertypes: a revised and updated classification divergent motifs but overlapping binding repertoires of six hla-dq molecules frequently expressed in the worldwide human population five hla-dp molecules frequently expressed in the worldwide human population share a common hla supertypic binding specificity unique and conserved features of genome and proteome of sarscoronavirus, an early split-off from the coronavirus group 2 lineage several common hla-dr types share largely overlapping peptide binding repertoires cellular immune correlates of protection against symptomatic pandemic influenza sars-cov-2 seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup antibody-dependent enhancement occurs upon re-infection with the identical serotype virus in feline infectious peritonitis virus infection the covid-19 vaccine development landscape human t cell response to dengue virus infection early death after feline infectious peritonitis virus challenge due to recombinant vaccinia virus immunization the immune epitope database (iedb): 2018 update comprehensive analysis of dengue virus-specific responses supports an hla-linked protective role for cd8+ t cells human cd8+ t-cell responses against the 4 dengue virus serotypes are associated with distinct patterns of protein targets preexisting influenza-specific cd4+ t cells correlate with disease protection against influenza challenge in humans plasma inflammatory cytokines and chemokines in severe acute respiratory syndrome characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72 314 cases from the chinese center for disease control and prevention an infectious cdna clone of sars-cov-2 recovery from the middle east respiratory syndrome is associated with antibody and t-cell responses age-related increases in pgd(2) expression impair respiratory dc migration, resulting in diminished t cell responses upon respiratory virus infection in mice airway memory cd4(+) t cells mediate protective immunity against emerging respiratory coronaviruses t cell responses are required for protection from clinical disease and for virus clearance in severe acute respiratory syndrome coronavirus-infected mice evasion by stealth: inefficient immune activation underlies poor t cell response and severe disease in sars-cov-infected mice clinical course and risk factors for mortality of adult inpatients with covid-19 in wuhan, china: a retrospective cohort study epitope pools detect cd4 + and cd8 + t cells in 100 and 70% of convalescent covid patients • t cell responses are focused not only on spike but also on m, n and other orfs • t cell reactivity to sars-cov-2 epitopes is also detected in non-exposed individuals an analysis of immune cell responses to sars-cov-2 we would like to thank cheryl kim, director of the lji flow cytometry core facility for outstanding expertise. we thank prof. peter kim, abigail powell, phd, and colleagues (stanford) for rbd protein synthesized from prof. florian krammer (mt. sinai) constructs. j.m. was supported by ph.d. student fellowships from the departamento administrativo de ciencia, tecnología e innovación (colciencias) and pontificia universidad javeriana (convocatoria 727 doctorados nacionales). this work was funded by the nih niaid under awards ai42742 (cooperative centers for human immunology) (s.c., the authors declare no competing interests. key: cord-289765-79cmcvfi authors: morrison, mike; merlo, kelsey; woessner, zach title: how to boost the impact of scientific conferences date: 2020-09-03 journal: cell doi: 10.1016/j.cell.2020.07.029 sha: doc_id: 289765 cord_uid: 79cmcvfi we can maximize the impact of scientific conferences by uploading all conference presentations, posters, and abstracts to highly trafficked public repositories for each content type. talks can be hosted on sites like youtube and youku, posters can be published on figshare, and papers and abstracts can become open access preprints. thanks to the heroic efforts of conference administrators across science, most academic conferences scheduled for 2020 have been at least partially translated into a virtual format. now, it is time to figure out what role online content should have in the future and how to maximize its engagement and impact. first, to get in the right mindset, it will help to stop thinking of annual scientific conferences as only updating a subset of attending scientists on what is happening in a field and start thinking of conferences as being able to update the entire world on what is happening in a field of study-especially all relevant scientists, whether they pay dues for that conference or not. figure 1 places the ''classic 3'' scientific conference content types-presentations, posters, and conference abstracts-on a continuum ranging from ''easy to produce'' to ''hard to produce'' and from ''small impact'' (on just a few people) to ''big impact'' (reaching tens or hundreds of thousands of people). as shown in figure 1 , presentations and posters take a great deal of effort to produce but are only seen by a narrow subset of scientists who pay to attend the conference and also manage to make time for that particular session. as for conference abstracts, it is doubtful whether most are read by anybody besides their peer reviewers and maybe a few persistent researchers who endure the login walls or request copies from the authors. as is typical of the scientific community, we do great work, and then put it where nobody can see it. the biggest barrier to increasing the impact of scientific presentations was getting them recorded. the technical barriers to recording scientific talks have been trivial since smartphones became ubiquitous (virtual talks are already recorded, and offline talks could be recorded by having a ''designated recorder'' in the audience), but until now there has been a big human barrier: scientists have an almost introverted hesitance to record our talks. happily, covid-19 has pushed us all across this threshold and forced us to get some comfort with the idea of having our talks recorded and posted online. but where is the best place to publish them? posting talks behind a paywall is better than nothing having all our talks recorded and posted behind a paywall (or login wall) on a conference organization's website (as many conferences are doing now during the pandemic) is a huge step toward broader impact because it helps our talks reach those scientists who attended the conference but missed the talk and those who could not afford the time or money to travel to the conference. but the audience is still limited to members who both pay dues and tolerate the often-clunky conference website. sporadic uploading to websites like youtube and youku could start a trend toward higher impact the impact potential of scientific talks cannot be fully realized until they are posted on major, public video sharing sites like youtube because that's where the people are (scientists and otherwise). sites like youtube (in the united states) and youku (in china) each receive over one billion views per day (similarweb, 2020) . encouraging scientists to upload their talks to public video sharing sites individually would create sprinklings of high impact, but adoption would likely be patchy given that scientists lack experience and comfort with the process and may be unmotivated to try it because they are inexperienced with how rewarding broad impact can be. mass uploading to public video sharing websites would change science and the world to ensure that nearly all scientific conference talks are published to popular sites like youtube and youku-and give many scientists exposure to the sublime feeling of having a broad impact-we could stop waiting on scientists to do it themselves. instead, conferences could manage the posting of all recorded talks. academic conferences are already abnormal for keeping their conference talks to themselves. in industry, uploading videos of all conference events and talks to major video sharing websites is just standard practice. the annual consumer electronics show (ces) in the united states generates thousands of youtube videos every year, covering every talk and event from multiple perspectives, all professionally recorded (consumer electronics show, 2019). science may lack the professional audio-visual staff of a major industry conference, but it makes up for this in sheer industriousness. many academic conferences since the pandemic have demonstrated that they have the human capacity to process and upload hundreds of videos. and where extra manual labor is needed, science's army of undergraduate research assistants (ras) can be employed to consolidate, edit, and upload the videos. volunteer ras who help with this task will be learning about their field's research as they do it. according to a pilot study we conducted at a major scientific conference, traditional scientific posters received an average of 6.4 visitors, according to presenters' own subjective count (k.l. merlo et al., 2019, aha, conference). typically, posters are thrown away after those 6.4 visitors. encouraging or requiring poster presenters to upload their final posters to sites like figshare.com (a website that treats scientific figures as independent, micro-publications) could bring thousands of additional virtual visitors to that poster. posters uploaded to figshare are also assigned a direct object identifier (doi) link, which helps google discover them in searches, such as those searches conducted by researchers performing literature reviews in the future. these figshare poster publications can then link to video talks on youtube, youku, etc. and extended abstracts for more detail. abstracts can be made more accessible as preprints before creating their talk or poster, some fields ask scientists to submit 5-to 15page extended abstracts containing details about the new research they would like to present. if these essays pass peer review, then the author is given permission to present them as a talk or poster. but what happens to the essay? right now, abstract essays are often left to rot behind paywalls and login walls, only discoverable by using the conference's limited, members-only search feature. these conference abstracts include details and references typically not found in the talk or poster and can be incredibly helpful to scientists who are conducting literature reviews in the future. for other fields, posters and presentations communicate later-stage research that is already in the process of being published, which often means that draft manuscripts of the research exist. requiring conference presenters to upload their extended abstracts or pre-published manuscripts to a preprint website (e.g., biorxiv.org) prior to submission would ensure that all conference abstracts and associated manuscripts are available open access and discoverable via google scholar and similar search engines. social media platforms (e.g., twitter, tiktok, instagram) can dramatically increase the impact and reach of all forms of scientific content and may drive more member registrations and profits to the associated conferences. social media statements are good advertising already, innovative scientific conferences will ask scientists to submit a ''social media statement'' to be posted from the conference's social media accounts. here is a real example submitted by our first author: ''want to find out how people talk about their work when they find it meaningful? come see our #siop19 poster at 2 p.m. on tuesday, board #252!'' these social media statements help get the word out but typically do not contain any actual insight; they merely promise to teach the reader insight if they attend. social media ''microlearning'' posts actively teach new science the biggest leap forward in impact we can make when sharing science on social media is to use it for more than advertisingto actively teach people new research in bite-sized posts and link those mini-lessons to articles and talks for even deeper learning. instead of a purely ''promotional'' tweet like the above example, the author could include a key takeaway: ''when people find their work meaningful, they talk about their work like it is part of their identity,'' and then link people to the extended abstract preprint, presentation video, and the conference event schedule for more. teaching people something valuable about a study in the span of a tweet or instagram post puts off a stronger information scent (pirolli and card, 1999) . it establishes that tweet as providing a high rate of reward for a small investment of attention. counterintuitively, this may actually provide stronger bait and make people more likely to click through to the talk or article. at minimum, everyone who reads the tweet learns something, whether they click or not. to maximize the engagement with science posted on social media, we can go beyond text-only posts and create social media ''flipbook'' mini-posters (see figure 2 ). essentially a silent presentation, these flipbooks can be 3-5 powerpoint slides illustrating the key aspects of a study in a visual format. when posted on social media, the slides either auto-advance or allow scientists to swipe through them at their own pace. examples of researchers using this technique to share new science can be seen across many fields under the ''#twit-terposter'' hashtag on twitter (also see morrison, 2020). as one prominent example, graduate students in the psychology department at university college dublin organized their own unofficial social media poster session full of these flipbookstyle mini-posters that was so popular that their virtual poster session became a trending hashtag (twitter, 2020) . this same ''flipbook'' design pattern would also probably make an excellent virtual poster experience apart from social media. now that conferences are online, any format that can be imagined for scientific knowledge exchange can be realized with the right technical resources, including holding an entire conference in virtual reality, as the 2020 educators in vr international summit chose to do (vanfossen, 2020) . the technical possibilities on the web are endless, but the goals of scientists who attend conferences are more finite. here are a few of them: 3. see what's going on in my field broadly (give serendipity a chance). 4. meet new people in my field. it can help to focus less on translating traditional formats, and more on translating traditional goals. for example, the van andel institute (2020) met the social goals of a poster session by creating a chatroombased poster session using slack. similarly, the saber community asked presenters to post 1-min personal summary videos next to each poster file (saber, 2020) . to help researchers get updated on their focal area, conferences could use the data they already have on which topics each researcher has presented on and create short ''new findings in your topic area'' pages tailored to the individual. how can conferences motivate busy scientists to create social media mini-posters, put our talks on youtube, upload our abstracts as preprints, and use allnew formats? easy: just nudge us to do it at exactly the right time-when we need something from the conference. scientists are typically too busy to volunteer extra effort in the name of impact, except when we are in the process of submitting to a conference. in that frantic state, we will jump through any hoop to get our work submitted. this is called the goal gradient hypothesis: the closer someone is to their goal, the more effort they will put in (kivetz et al., 2006) . the goal gradient hypothesis suggests that we have a lot of leeway in what we can ask scientists to do at the submission stage. in our view, much of the above (e.g., asking for links to preprints of extended abstracts and manuscripts) would work well as requirements added to the process of submitting to a conference. send impact reports to give scientists more meaning it could be incredibly motivating to send scientists impact reports to help them see that, contrary to their expectations, people actually downloaded their posters and viewed their talks. to take this up to the level of the whole field, the ''total views'' for all content in the conference across all platforms could become a field-wide impact metric that can be tracked and celebrated year-over-year. a field's annual conference is a crucial source of income for many scientific societies, which often motivates the ''paywall model'' for online conference content. one antidote to this is a ''drip-release'' model, where all content is made immediately available to paying members but is then scheduled to ''drip'' out to public repositories and social media over the next year. the world-renowned ted (technology entertainment design) conference uses exactly this type of drip-release model. they publicly release one talk per day on youtube (and their public website) over the year following their in-person conference, accompanied by social media posts with key takeaways (mcmanus, 2013) . this content drip can be automated so that a year's worth of youtube uploads and social media posts can be put on autopilot with one, initial burst of effort. the content-drip keeps dues-paying members engaged and learning throughout the year and builds interest and excitement for next year's confer-ence. it also strengthens the conference's brand image as a source of knowledge in that area. many scientists are uncomfortable with the idea of making scientific contentespecially conference content-more broadly accessible. objection #1: the public won't understand that conference research is early stage there is an assumption that scientists always critically evaluate conference research for its ''potential,'' whereas the naive, gullible public will accept everything we say as gospel and immediately start bottling it into snake oil. remember that our current approach (in which the content disappears when the conference ends) is keeping new science away from many experts who would understand and benefit from it (for example, scientists in the field who could not attend that particular conference). is it worth keeping new science away from those people to coddle the public? the best long-term outcome is not keeping the public blind to raw science; it is teaching them to understand it. and if we want the public to better understand science, then at some point we cannot keep protecting them from it. objection #2: we shouldn't dumb it down for social media when worrying that a complex, nuanced study cannot possibly be boiled down to a tweet or a reddit post, it is important to note that science is already distilled successfully on social media on a large scale. the ''reddit.com/r/science'' community posts single-sentence summaries of new findings (linked to study abstracts) for an audience of over 24 million subscribers. sometimes a finding on reddit.com/r/ science has a summary that is too over simplified (e.g., neglects to say ''in mice''), and in those cases, the top comment (often from a scientist in a related field) will typically correct the misimpression and provide the missing context. websites like reddit and twitter are populated by a mix of scientists and laypeople and are already capable of doing a pretty good job of summarizing new science. they can only improve if authors more actively shape the interpretation of their own work. objection #3: i just don't feel confident enough in my presenting ability we have pressed many scientists to share their work more broadly. often, after getting past the above objections, these researchers confessed something to the effect of ''i just don't feel confident enough in my presenting ability to want people to see my talks and posters.'' happily, some of these same scientists have shown renewed interest in sharing their talks more widely after giving a talk they are especially proud of. if we can build more self-efficacy in scientists about communicating their work, then the rest of the objections may fall, given that research in work psychology has shown that feeling like you are good at something makes you more interested in doing it (m.d. zimmerman and b.m. wiernik, 2020, society for industrial and organizational psychology, conference). available free, online, and easy to browse if the above practices are implemented, it will result in tens of thousands of new scientific talks, posters, and papers flooding the public internet and social media every year. every scientist will be able to quickly access any content from any conference that could aid their research (without needing to log in or request copies from authors), and millions of people searching public video sites like youtube and youku to learn about a new topic will begin to discover real science on it. making science more easily accessible to the world will not be without some negative consequences, but it could also accelerate the pace of discovery and build a direct line of communication between science and the public. all it takes to start this revolution is a few clicks of an upload button. consumer technology association attendance audit summary the goal-gradient hypothesis resurrected: purchase acceleration, illusionary goal progress, and customer retention how long does it take for a ted talk to show up as a publicly viewable video how to create a quick twitter poster to share new research (includes templates). youtube video information foraging our virtual poster committee has done a fantastic job organizing posters and making them (and their authors' contact info) easy to find and appreciate!'' twitter traffic overview for youtube #gifsfromyourgaff. hashtag search lessons learned from hosting a vr conference key: cord-284944-hcgfe9wv authors: silvin, aymeric; chapuis, nicolas; dunsmore, garett; goubet, anne-gaëlle; dubuisson, agathe; derosa, lisa; almire, carole; hénon, clémence; kosmider, olivier; droin, nathalie; rameau, philippe; catelain, cyril; alfaro, alexia; dussiau, charles; friedrich, chloé; sourdeau, elise; marin, nathalie; szwebel, tali-anne; cantin, delphine; mouthon, luc; borderie, didier; deloger, marc; bredel, delphine; mouraud, severine; drubay, damien; andrieu, muriel; lhonneur, anne-sophie; saada, véronique; stoclin, annabelle; willekens, christophe; pommeret, fanny; griscelli, frank; ng, lai guan; zhang, zheng; bost, pierre; amit, ido; barlesi, fabrice; marabelle, aurélien; pène, frédéric; gachot, bertrand; andré, fabrice; zitvogel, laurence; ginhoux, florent; fontenay, michaela; solary, eric title: elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild covid-19 date: 2020-08-05 journal: cell doi: 10.1016/j.cell.2020.08.002 sha: doc_id: 284944 cord_uid: hcgfe9wv summary blood myeloid cells are known to be dysregulated in the coronavirus disease 2019 (covid-19) caused by sars-cov-2. it is unknown whether the innate myeloid response differs with disease severity, and whether markers of innate immunity discriminate high risk patients. thus, we performed high dimensional flow cytometry and single cell rna sequencing of covid-19 patient peripheral blood cells and detected the disappearance of non-classical cd14lowcd16high monocytes, the accumulation of hla-drlow classical monocytes, and the release of massive amounts of calprotectin (s100a8/s100a9) in severe cases. immature cd10lowcd101-cxcr4+/neutrophils with an immuno-suppressive profile accumulated as well in blood and lungs, suggesting emergency myelopoiesis. we finally showed that calprotectin plasma level and a routine flow cytometry assay detecting decreased frequencies of non-classical monocytes could discriminate patients who develop a severe covid-19 form, suggesting a predictive value that deserves prospective evaluation. coronavirus disease 2019 is caused by severe acute respiratory syndromecoronavirus 2 (sars-cov-2), which infects the lungs leading to fever, cough and dyspnea (guan and zhong, 2020) . most patients presenting with mild disease develop an efficient immune response (thevarajan et al., 2020) , but some go on to develop acute respiratory distress syndrome leading to admission in intensive care unit (icu), often culminating in multi-organ dysfunction and death . in addition to cell autonomous effects of the viral infection, a dysregulated immune response participates in the sudden deterioration of covid-19 patients, ultimately overwhelming infected and non-infected tissues (vabret et al., 2020) . this overt inflammatory response centers around a cytokine storm (chen et al., 2020a) with elevated blood concentrations of interleukin-6 (il-6). accordingly, therapeutic agents targeting the il-6/il-6r-gp130 axis can alleviate the inflammatory response (michot et al., 2020) and ameliorate immune dysregulation (giamarellos-bourboulis et al., 2020) , emphasizing the clinical significance of this cytokine. a marked lymphocytopenia is also associated with covid-19 severity (chen et al., 2020a) ; however, the primary source both of the cytokine storm and the mechanisms behind the lymphocytopenia remains elusive . a growing body of evidence points to dysregulation of innate immune cells of the granulomonocytic lineage during lung viral infections. a variety of human viruses infects monocytes and macrophages to spread through the tissues (al-qahtani et al., 2017; desforges et al., 2007; nottet et al., 1996; smith et al., 2004; yilla et al., 2005) . sars-cov-2 mrna is detectable in lung monocytes/macrophages of severe covid-19 patients (bost et al., 2020) , though its ability to enter these cells in the peripheral blood and activate them directly remains unclear. also, tissue damage induced by sars-cov-2 infection may lead to the release of pathogen-and damage-associated molecular patterns that, in turn induce the activation and recruitment of inflammatory cytokine-and chemokine-producing innate immune cells in an amplifying loop (liao et al., 2020) . it remains unclear to what extent immune patterns associated with covid-19 pathophysiology are causative and exacerbating the disease and/or could be used for accurate patient stratification. here, using high dimensional single cell approaches including single cell rna sequencing, mass cytometry and 25-parameter spectral flow cytometry, we show that patients doomed to develop a severe disease exhibit a massive release of s100a8/s100a9 calprotectin accompanied by changes in monocyte and neutrophil subsets. we further discover that this pathological immune system reorganization is initiated by the onset of an emergency myelopoiesis that release immature myeloid cells with an immunosuppressive phenotype into peripheral blood and lungs. together, our study integrates frequencies of non-classical monocytes and immature neutrophils with calprotectin plasma level as robust biomarkers of covid-19 severity, suggesting potential therapeutic strategies targeting calprotectin to alleviate severe covid-19. this non-interventional study enrolled 158 patients (table s1) , including 86 and 72 subjects referred to the hospital with various flu-like symptoms who were diagnosed or not with covid-19, based on positive and negative rt-pcr on pharyngeal swabs respectively. patients were stratified according to disease severity: mild disease (n=27) was defined as having no or limited clinical symptoms and not requiring ct-scan or hospitalization; moderate disease (n=16) was defined as being symptomatic, with dyspnea and radiological findings of pneumonia on thoracic ct scan, requiring hospitalization but with a maximum of 9 l/min of oxygen; and severe disease (n=43) was defined as respiratory distress requiring admission into the intensive care unit (icu). mild and moderate cases were mixed in the discovery part of the study and considered separately to explore the ability of a routine flow assay to discriminate patients that require hospitalization. in order to explore changes induced by sars-cov-2 infection in circulating immune cell phenotype, we first collected peripheral blood samples from a discovery cohort of 13 patients positive for sars-cov-2 (hereafter "covid-19 patients") by rt-pcr and 12 patients suffering from flu-like symptoms but negative for sars-cov-2. the former group included 5 patients with mild disease and 8 patients with severe covid-19 (table s2) . after red blood cell lysis, we labelled peripheral blood cells with a panel of 25 antibodies recognizing immune cell surface markers (key resource table) and analyzed them by spectral flow cytometry ( figure 1a , s1a and s1b). by pooling the data from the 25 control and covid-19 patients and subjecting them to dimensionality reduction using the non-supervised umap algorithm , we identified populations of cd4 + t cells, cd8 + t cells, cd19 + b cells, cd14 high monocytes and cd15 + cd66b + neutrophils (figure 1b and 1c) . we also identified hla-dr high cd11b + and cd16 high monocytes as well as neutrophils expressing cd11b, cd15, cd16 and cd64 (figure 1b and 1c) . analysis and visualization, using umap dimensionality reduction to the cell surface marker expression datasets from control, and mild and severe covid-19 groups, suggested differences in the repartition of cell populations ( figure 1d ). severe patients exhibited an expansion in the proportion of circulating neutrophils within the peripheral blood cell population (figure 1e) , which was associated with an increase in their absolute number (table s2) , as already reported . focusing on neutrophil subsets, we noticed a slight increase in the fraction of cd10 low cd101 + neutrophils in mild covid-19 patients (figure 1f) , whereas the fraction of cd10 low cd101neutrophils was remarkably amplified in severe patients, suggesting an accumulation of immature subsets of neutrophils (ng et al., 2019) in the peripheral blood of these patients ( figure 1g and s1c-e). in severe patients, the absolute number of circulating monocytes (table s2 ) and the proportion of total monocytes among peripheral blood leucocytes (figure 1h and s1f) were similar to controls, but we noticed changes in monocyte subset repartition. the fraction of cd14 high cd16 high intermediate monocytes was significantly greater in mild covid-19 patients (16.95+/-6.75%) than in control (5.84+/-1.02%) or severe (6.77+/-1.10%) groups, while the non-classical cd14 low cd16 high monocyte fraction was lower in severe covid-19 patients (1.31+/-0.35%) than in mild (5.46+/-1.57%) or control groups (6.68+/-1.14%) (figure 1i and 1j). within the cd14 high cd16 low classical monocyte subset (figure s1g), we detected higher frequencies of cd11b high monocytes with increased disease severity (figure s1h) , while the intensity of hla-dr expression was lower across both cd11b + and cd11bmonocyte populations of severe covid-19 patients ( figure 1k and 1l) . changes in myeloid cell repartition observed in severe patients were associated with lower frequencies of b cells compared to controls (p<0.001), and of cd4 + (p<0.001) and cd8 + t cells (p<0.01) relative to both controls and mild patients, while cd56 + nk cell frequencies remained comparable across all groups ( figure 1m ). altogether, these data suggested sars-cov-2-induced changes in the relative abundance of monocyte and neutrophil subsets within the peripheral blood cell population, with loss of non-classical cd14 low cd16 high monocytes, reduced expression of hla-dr on classical monocytes and drop in cd101 and cd10 expression on neutrophils, characterizing severe cases. as a second step in our discovery process, we collected peripheral blood samples from three control patients with flu-like symptoms tested negative for sars-cov-2, and three sars2-cov-2 positive patients, one outpatient with mild disease and two patients with severe disease admitted to icu (figure 2a and 2b ; table s3 ). using the 10x chromium dropletbased platform, these samples were subjected to single cell rna sequencing (scrnaseq) immediately after collection and red blood cell lysis, without additional sorting or freezing, in an attempt to preserve fragile cell populations, mainly neutrophils. unsupervised clustering based on gene expression identified b and t cells as well as neutrophils, monocytes, erythroid cells and platelets ( figure 2c and figure s2a and s2b). samples analyzed by scrnaseq were simultaneously analyzed by spectral flow cytometry for comparison ( figure 2d ). umap analysis of spectral flow cytometry data suggested lower proportions of cd4 + and cd8 + t cells while the neutrophil fraction was greater in severe patients compared to controls and to the unique mild patient (figure 2e and s2c) . the three sars2-cov-2-infected patients were sampled again 10 days later to monitor progression of the immune response in relation to clinical status (figure 2a to 2e) . umap visualization of monocytes analyzed by scrnaseq identified three clusters (figure 3a) , that may correspond to well-defined monocyte subsets (guillams et al., 2018) . cells of cluster 1 expressed cd14, itgam (encoding cd11b) and klf4 while poorly expressing fcgr3a (encoding cd16), suggesting classical monocytes. cells of cluster 3, which expressed high levels of fcgr3a and low levels of cd14, may correspond to non-classical monocytes, and cluster 2, in which cells expressed both cd14 and fcgr3a, evoked intermediate monocytes ( figure 3a ). differential expressed genes (degs) and pathway analyses delineated a type i interferon signature in the mild covid-19 patient monocytes (figure 3b, s3a and s3b; table s4 ). this signature was less pronounced in the two severe covid-19 samples, contrasting with the elevated expression of genes involved in the production of reactive oxygen species (ros) and nitric oxygen species (nos) (figure s3a and s3b). a non-supervised umap analysis of the data collected by spectral flow cytometry of the same samples detected variations in monocyte subset repartition among patients: compared to controls and the mild patient, severe patient #1 showed a lower fraction of cd14 low cd16 high non-classical monocytes at day 0 while the other severe patient showed a high level of this monocyte fraction ( figure 3c) . additionally, the two severe patients showed markedly higher levels of classical cd14 high cd16 low monocytes, expressing more cd141 (thbd) at their surface ( figure 3d ), in accordance with the scrnaseq analysis (table s4 ). in the mild patient, one of the most highly expressed genes in classical monocytes was the interferon stimulated gene (sevelsted et al., 2015) siglec-1, consistent with the high level of expression of cd169, the corresponding protein, at the surface of classical monocytes at day 0 ( figure 3e ). ten days later, siglec-1 gene expression was down-regulated and cd169 expression was undetectable at the surface of hla-dr high classical monocytes ( figure 3b and 3e). the two severe patients exhibited low expression of hla-dr protein on their monocyte surface at day 0, without significant change at day 10 ( figure 3e ). validating these discovery experiments, we performed mass cytometry analysis of an independent cohort of 12 patients (four in each group; control, mild and severe) ( table s5) , which showed a lower fraction of cd14 low cd16 high non-classical monocytes in severe compared to mild patients ( figure 3f and 3g ). in accordance with pathway analysis of scrnaseq data highlighting nf-ĸb activation as a prominent feature in monocytes of severe patients ( figure 3b and s3b), we observed significantly higher levels of the phosphorylated transcription factor rela/p65 (p-p65), a critical effector of the canonical nf-ĸb pathway, in hla-dr low cd14 high classical monocytes from severe patients compared to controls ( figure 3h and 3i). we also measured p-p65 expression in circulating cd34 + cells, identifying increased expression in severe disease ( figure s3c ). umap analysis of neutrophils identified two clusters ( figure 4a ). we observed an increase of cluster 2 cells in severe covid-19 patients ( figure 4b ). cluster 1 expressed il1r2 gene whereas cluster 2 expressed also high levels of s100a8 and s100a9, cxcr4, sell and spi1 (figure 4c and s4a) . degs and pathway analyses in mild patient neutrophils informed of a type i interferon response at day 0 that was lost by day 10 (figure 4d, s4b and s4c ). this signature was absent in controls and also in the two severe patient samples collected at later time-points ( figure 4d ), demonstrating high expression of genes involved in the production of ros, the inducible nos pathway, il-1 signaling and nf-κb activation pathways ( figure s4b and s4c). analysis of the data collected by spectral flow cytometry of the same samples distinguished cd10 + cd101 + mature neutrophils from cd10 low cd101immature neutrophils. at day 0, the two severe patients had more circulating cd10 low cd101immature neutrophils compared to controls or the mild patient ( figure 4e ). severe patient #1 neutrophils had increased the expression of cd101 on their surface at day 10 while severe patient #2 neutrophils retained their immature phenotype at day 10. focusing on the expression of a pre-neutrophil hallmark cxcr4 at the surface of cd10 low cd101immature neutrophils (ng et al., 2019) , we observed an increase in the proportion of neutrophils with a cd10 low cd101 -cxcr4 + phenotype, which presumably are pre-neutrophils ( figure 4f ). mass cytometry analysis of an independent cohort of 12 patients (four controls, four mild and four severe covid patients, table s5 ) again suggested a higher fraction of cd10 low cd101immature neutrophils in severe patients compared to control patients ( figure 4g and 4h) . altogether, results of these exploratory scrnaseq experiments identified a transient type i interferon response in cells of a patient with a mild disease and the presence of phenotypically immature subsets of monocytes and neutrophils in two patients with a severe disease, which were further identified by mass cytometry. calprotectin plasma levels distinguish mild from severe covid-19 patients s100a8 and s100a9 alarmins, representing ~45% of the cytoplasmic proteins in neutrophils, are released under inflammatory conditions and form a stable heterodimer known as "calprotectin" (wang s et al., 2018) . in accordance with preliminary results generated by scrnaseq ( figure s4a) , rt-qpcr analysis detected higher expression of s100a8 and s100a9 genes in peripheral blood nucleated cells of patients with severe covid-19 (n=8) compared to controls (n=8) and patients with a mild disease (n=16) ( figure s5a and table s5 ). this led us to measure the plasma level of calprotectin, together with type i interferon (ifnα) and 40 other cytokines and chemokines in samples from a cohort of 84 patients ( table s6) . as observed in figure 5a , patients with mild disease showed significantly less cxcl8 (figure 5c and s5b) and significantly more type i ifnα (figure 5a and 5c) compared to controls. severe j o u r n a l p r e -p r o o f patients exhibited dramatically higher calprotectin levels compared to mild covid-19 or controls, without further increase in ifnα plasma level above mild disease levels ( figure 5b , 5c and s5c). calprotectin was the most significantly increased circulating biomarker in severe patients, accompanied by a rise in 23 other tested chemokines and cytokines, including cxcl-8, cxcl-12 and il-6 ( figure 5b, 5c and s5b ). age and comorbidities (including obesity, diabetes mellitus, cardiovascular and respiratory diseases and cancer) are predictors of severe covid-19 disease (richardson et al., 2020) . we found that plasma calprotectin levels were significantly higher in control patients with comorbidities, as well as in mild or severe covid-19 patients with comorbidities ( figure 5d) . nevertheless, the increase in calprotectin in severe covid-19 far exceeds that correlations associated with comorbidities. none of the other measured circulating proteins were significantly higher in patients with comorbidities ( figure 5e ). bacterial infections can occur in severe covid-19 (chen et al., 2020c; llitjos et al., 2020) and were present in some of our severe patients but did not significantly modify the profile of released proteins ( figure s5b and s5c), including calprotectin ( figure 5f ). no correlation between calprotectin and age was observed in each group of patients ( figure s5d ). calprotectin concentration correlated with neutrophil counts (figure 5g ), fibrinogen plasma levels ( figure 5h ) and d dimers (figure 5i) , the latter being fibrin degradation products reflecting a hypercoagulability state. modeling calprotectin plasma level using multivariable linear regression to take into account potential confounding factors (age, sex, comorbidities) and the correlation with neutrophil count, fibrinogen et d-dimers, these associations were still statistically significant (neutrophils p-value=1.154e-04; fibrinogen p-value=5.688e-05; d-dimers p-value=2.099e-03). we also uncovered a weak correlation between il-6 plasma concentration and levels of calprotectin ( figure s5e ), blood neutrophil count ( figure 5j ), fibrinogen ( figure 5k ) and d dimers (figure 5l ), which disappeared after adjusted multilinear regression. finally, a logistic regression including age, sex and comorbidities together with biological parameters identified plasma levels of calprotectin, cx3cl1, cxcl11 and cxcl13 as the parameters that best discriminate controls / mild covid-19 patients from severe patients. these results indicate that high plasma levels of calprotectin are seen in severe covid-19 patient, not in those with a mild disease. importantly, this increase is independent from confounding factors for prognosis such as advanced age, co-morbidities or concurrent bacterial infection which have only minor effects on plasma calprotectin levels. hypothesis from the scrnaseq-based identification of cd37, cd63 (lamp3), cd169 (siglec-1) and cd184 (cxcr4) biomarkers of blood cell subsets whose relative proportions differ in mild and severe covid-19 patients, prompted us to add additional antibodies targeting these proteins to the spectral flow cytometry panel. we applied this new panel to samples from an independent validation cohort of 90 patients. this cohort included 48 control patients and 42 covid-19 positive, of which 16 had mild disease and 26 had severe disease ( figure 6a and table s6 ). non-supervised analysis and umap visualization identified the main cell populations in the three categories of patients combined ( figure s6a and s6b) . analyzing patients individually confirmed the significant decrease in b cell, cd4 + t cell and cd8 + t cell fractions in severe patients compared to control and mild groups ( figure 6b) , which may be a consequence of the increased neutrophil fraction ( figure 6c ) and absolute number (table s6) . within neutrophils, we observed more specifically a shift in cd10 low cd101neutrophils ( figure 6d and 6e ) and the subset of cd10 low cd101neutrophils that express cxcr4 (cd10 low cd101 -cxcr4 + cells) ( figure 6f ) that we previously observed in severe patients. finally, the fraction of cd10 low cd16 low neutrophils was also higher in severe patients ( figure s6c ), further suggesting for the accumulation of immature neutrophils in the blood of severe covid patients. scrna-seq analyses of monocyte subsets had indicated differential changes in the distribution of non-classical cd14 low cd16 high monocyte fractions between the two patients with severe disease (see figure 3c ). since samples were collected from patients at various time points after admission in icu, we asked if the duration of icu stay affects the monocyte subset distribution. in the 26 severe patients of this cohort (table s6) , we observed a significant correlation between the time spent in icu and the fraction of non-classical monocyte subset, irrespective of the presence or absence of concurrent bacterial infection ( figure 6g ). mean time spent in icu was 5.46 days for patients with less than 5% nonclassical monocytes compared to 8.83 days for those with 5% and more non-classical monocytes ( figure 6h and 6i) . then, we examined other monocyte subsets: in the majority of mild patients, we observed a fraction of classical monocytes that express cd169, which was decreasing in severe patients ( figure 6j , 6k and s6d). cd169 expression correlated with plasma levels of ifnα ( figure s6e ). independent of the time they spent in icu, severe patients also showed a larger fraction of classical monocytes expressing high levels of cd141 compared to controls ( figure 6j and 6l) and of monocytes expressing low levels of hla-dr compared to controls and mild patients ( figure s6f) . finally, the time spent in icu did not significantly affect the repartition of lymphocyte populations or neutrophil subsets ( figure s6f and s6g). thus, severe covid-19 patients exhibited a transient decrease in non-classical monocyte frequencies, a stable decrease in hla-dr low cd141 + classical monocytes and a major increase in cd10 low cd101 -cxcr4 +/immature neutrophils. we next investigated whether changes in circulating myeloid cell phenotypes could be used to discriminate patients who develop severe covid-19. within our previous cohort, we separated mild (n=12) from moderate (n=6) and severe (n=27) patients using clinical criteria. patients classified as "moderate" demonstrated intermediate changes between those of mild outpatients and severe patients in icu ( figure s7a ). the fraction of cd10 low cd101neutrophils in moderate patients was intermediate but not significantly different to any group ( figure 7a) . however, the amount of calprotectin measured in moderate covid-19 patients was significantly higher than mild outpatients but still significantly lower than severe covid-19 patients ( figure 7b ). in comparison, ifnα levels were not significantly different between moderate and mild or severe patients ( figure s7b) . the difference in nonclassical monocyte fraction was significant between mild and moderate patients, dropping down to levels comparable to severe cases ( figure 7c ). thus, we hypothesized that the decreased non-classical monocyte fraction could be used as a fast and simple diagnostic test to distinguish moderate from mild covid-19 cases, especially in cases were clinical symptoms may be substantially overlapping. this would facilitate rapid and accurate identification of currently classified as "mild" patients at the cusp of potentially progressing to more severe disease. we therefore employed a low dimensional flow cytometry approach that measures the fraction of classical (cd14 high cd16 low ), intermediate (cd14 high cd16 high ) and non-classical (cd14 low cd16 high ) monocyte subsets among total peripheral blood monocytes, and applied it initially to a learning cohort of 98 patients, consisting of 16 mild, 10 moderate and 16 severe covid-19 patients, along with 56 controls ( table s7 ). all hospitalized patients were sampled within ten days of admission to limit the potential impact of time in icu (see figure 6g ); the mean time spent in icu was 5.5 days at the point of sampling. the cohort also included 56 controls. patients with mild disease showed a fraction of non-classical monocytes similar to that observed in controls. in contrast, moderate patients showed lower levels of non-classical monocytes, as observed in severe patients ( figure 7d) . to measure the global performance of this test, we used a receiver operating characteristic (roc) curve (hajian-tilaki, 2013) . the point of the roc corresponding to the best sensitivity/specificity compromise indicated that a non-classical monocyte fraction below 4% separated patients with moderate or severe covid-19 from those with mild or no disease with 76.9% sensitivity (95% bootstrap confidence interval (bci) [61.5%; 92.3%]) and 89% specificity (95% bci [80.6%; 95.8%]) ( figure s7c ). we then applied these analyses to blood samples from an independent validation cohort of 24 hospitalized patients from a different clinical center (10 controls, 3 mild, 4 moderate and further confirming these observations, serial sampling of two severe patients who responded to anti-il-6r antibodies documented that their clinical recovery was associated with the reappearance of non-classical monocytes in the blood ( figure s7d) . alongside, one patient who was referred initially with limited symptoms (atypical thoracic pain) and was sars-cov-2 pcr negative unexpectedly exhibited a low fraction of non-classical monocytes (3.4%), accompanied by 10% hla-dr low classical monocytes. the following day, pulmonary symptoms appeared, the patient was hospitalized requiring oxygen therapy, and a lung ctscan revealed characteristic covid-associated injury. such cases suggest that the loss of nonclassical monocyte fraction could be a strong indicator of existing or impending severe covid-19. additional informative parameters could be added to this flow assay to increase its specificity to identify a transition to severe covid-19, including a decreased expression of hla-dr at the surface of classical monocytes ( figure s7e ) that is associated with a decrease in non-classical monocyte fraction below 4% (figure s7f) , and an increase in the fraction of cd16 low neutrophils ( figure s7g ). comparison of roc curves indicated that calprotectin plasma level and monocyte or neutrophil subset analyses distinguished mild covid-19 in outpatients from moderate or severe disease in hospitalized patients, while ifnα2a plasma level did not ( figure s7h ). together with calprotectin plasma level, flow identification of a decrease in non-classical monocyte fraction below 4% of total monocytes could provide improved resolution to clinical observations when categorizing patients at the borders of mild and moderate/severe covid-19. this would potentially identify those individuals at greatest risk of rapid decline and highlight the need for pro-active management/intervention and intensive monitoring. this assay could be reinforced by analysis of hla-dr low classical monocyte and cd16 low neutrophil fractions. the lungs are a major organ affected in severe covid-19 patients. to better understand how the distinctive cell signatures found in the blood of severe covid-19 patients, particularly the presence of immature neutrophils and hla-dr low monocytes, affected immune cell compartments in the lungs, we integrated our dataset using the seurat v3 pipeline (stuart et al., 2019) with the published scrnaseq dataset of cells from 12 bronchoalveolar lavage fluids (balf) of control (n=3), mild (n=3) and severe (n=6) covid-19 patients (liao et al., 2020; gse145926) . this analysis provided an unbiased global map of immune cells in the blood and balf of control, mild and severe covid-19 patients. using dimensional reduction, we identified 5 regions based on degs across pooled data from all samples (figure 7f and s7i), including t cells (characterized by the expression of genes including nkg7, cd8a, cst7, gzmb and gzma), b cells (iglv3-19, ighv4-34, ighg1, igha1 and jchain), neutrophils (g0s2, rsad2, il1r2 and il1rn), alveolar macrophages (apoe, msr1, marco and fbp1) and monocytes/macrophages (fn1, cxcl10, cd68 and nupr1). validating this approach, the alveolar macrophage region was mainly present in balf of control patients but was dramatically decreased in mild and severe covid-19 patients and only one cell from our blood scrnaseq matched in this region (figure 7f and 7g) . we also observed changes in the monocyte/macrophage region of the balf from mild or severe patients versus controls and a dramatic neutrophil accumulation in severe disease ( figure 7g ). monocytes/macrophages were increased in balf of mild compared to control and severe groups (figure 7h and 7i) and these cells were characterized by the expression of the isgs (siglec-1, ifi44 and ifitm3) ( figure 7j ) with pathway analyses indicating the upregulation of viral replication and interferon type i signaling pathways. in contrast, nos biosynthetic process and monocyte chemotaxis were upregulated in balf monocytes/macrophages of severe patients ( figure s7j ) that, similar to blood monocytes, expressed lower levels of hla-dra and hla-drb1 and higher levels of nfkbia mrna compared to controls or mild covid-19 patients ( figure 7j) . finally, neutrophils were present at high frequencies in balf from severe covid-19 patients but not in balf from controls or mild patients ( figure 7k and 7l) . umap integration of severe patient samples indicated that balf neutrophils, similar to blood neutrophils (figure 7l) , were characterized by high expression of s100a8, s100a9 as well as cxcr4, indicating an immature state, (figure 7m, 7n and s7k) . altogether, integration of blood and balf myeloid cells identified in severe covid-19 patients the loss of hla-dra and hla-drb1 and high nfkbia expression in monocytes/macrophages (not including alveolar macrophages), together with an accumulation of neutrophils expressing high levels of s100a8/a9 and cxcr4. this study presents evidence that patients who develop a severe covid-19 exhibit high levels of calprotectin and inflammatory cytokines and chemokines correlating with an emergency myelopoiesis generating ros-and nos-expressing immunosuppressive myeloid cells (hla-dr low monocytes and immature subsets of neutrophils). the first line of defense in viral-infected patients typically involves a protective innate response incorporating the transient and strong production of type i ifns. through inducing expression of isgs, type i ifns inhibits virus replication and promotes an effective innate and adaptive immune response (thevarajan et al., 2020; totura and baric, 2012) . this antiviral response may be overflowed in covid-19 patients who suddenly evolve into clinically threatening disease (hadjadj et al., 2020) . severe covid-19 frequently develops in the context of advanced age and comorbidities that provide a degree of underlying systemic chronic inflammation (furman et al., 2019) . such inflammation could disrupt the timing of type i ifn response relative to the kinetics of virus replication (teran-cabanillas and hernandez, 2017), which was shown to be critical in mouse models of coronavirus infection (channappanavar et al., 2019) . an imbalance between the type i ifn and inflammatory responses could also be favored by the highly efficient replication of sars-cov-2 in human tissues (chu et al., 2020) , and by the ifn-neutralizing effects of structural and non-structural viral components shared between sars-cov-2 and other virulent human coronaviruses (chen et al., 2014; yang et al., 2015) . severe covid-19 patients exhibit abnormal partition of circulating monocytes and of neutrophils expressing s100a8 (calgranulin a / myeloid-related protein 8) and s100a9 (calgranulin b / myeloid-related protein 14) alarmin genes. importantly, an accumulation of neutrophils expressing high levels of s100a8/a9 genes was also observed in the balf of these patients. the release of massive amounts of calprotectin, the heterodimer formed by s100a8 and s100a9 proteins, is a striking event associated with severe covid-19. this heterodimer promotes cell migration and boosts nadph (nicotinamide adenine dinucleotide phosphate) oxidase activity. calprotectin is a tlr4 and rage (receptor for advanced glycation end products) ligand that, upstream of tnfα (vogl et al., 2018) and cxcl8 (simard et al., 2014) synthesis and secretion, promotes nf-ĸb activation (riva et al., 2012) and the secretion of multiple inflammatory proteins as il-6 (wang et al., 2018) . thus, we propose that calprotectin may account for, and possibly trigger the cytokine release syndrome that characterizes severe covid-19. its production may be amplified by tissue damage, generating a harmful hyper-inflammation loop (kuipers et al., 2013) that precludes these peptides from exerting more protective functions (austermann et al., 2014; freise et al., 2019; ulas et al., 2017; vogl et al., 2018) . chronic inflammation from comorbidities may synergize with sars-cov-2 viral infection to induce a systemic release of calprotectin, which translate by the up-regulation of nf-kb and the loss of hla-dr on classical monocytes and the presence of immature neutrophils, altogether converging to a state of chronic inflammatory induced immunosuppression. abnormal neutrophils were observed previously in severe covid-19 patients (wilk et al., 2020) . however, authors concluded that these neutrophils transdifferentiate from b cells. we have no supporting results suggesting that it could be the case. in healthy conditions, roughly 85% of total circulating monocytes are cd14 high cd16 low hla-dr high cells that are rapidly recruited to inflamed tissues (guilliams et al., 2018) . as in other severe illnesses (lukaszewicz et al., 2009) , the expression of hla-dr on cd14 high circulating monocytes is low in severe covid-19, which correlates with, and could be mediated by, il-6 overproduction (giamarellos-bourboulis et al., 2020). a more specific feature of covid-19 is the low fraction of cd14 low cd16 high non-classical monocytes. this fraction commonly increases in patients with sepsis and inflammatory diseases, including viral infections (kratofil et al., 2017) . the decrease in non-classical monocyte fraction could involve the ability of calprotectin to fasten the trans-endothelial migration of leucocytes (fassl et al., 2015) , unless these cells strongly adhere to the endothelium, or the conversion of classical into non-classical monocytes is stuck (hanna et al., 2011; hofer et al., 2015 ) (selimoglu-buet et al., 2018 . whatever the mechanism, the lower than normal frequencies of non-classical monocytes (thevarajan et al., 2020; hadjadj j et al., 2020) suggest a sars-cov-2 characteristic effect that is not observed in other viral infections. most importantly, this decrease generates a highly characteristic biological signature of covid-19's aggressive form, with the potential to be easily measured using standard diagnostic flow cytometry and to provide information on the real-time immunological severity of the infection. the burst of calprotectin detected in covid-19 patients may trigger nf-ĸb-driven emergency myelopoiesis, generating immature and dysplastic cells (basiorka et al., 2016; chen et al., 2013) . given the considerable hematopoietic potential of the lung (lefrancais et al., 2017) , the burst of calprotectin could also promote the contribution of lung megakaryocytes to disease pathogenesis in this organ. whatever the mechanism, the immature and mature cells released into the peripheral blood by emergency myelopoiesis may be endowed with immunosuppressive functions, suggesting that myeloid derived suppressive cells (mdscs) as detected in cancer, inflammation and other diseases (veglia et al., 2018) might be important in covid-19. in addition to hla-dr low monocytes whose phenotype is that of monocytic mdscs (m-mdscs), cd10 low cd101 -cxcr1 + immature cells are reminiscent of granulocytic mdscs (g-mdscs) (aarts et al., 2019; mastio et al., 2019; veglia et al., 2018) . thus, neutrophil precursors such as the pre-neutrophil (preneu) population that are cxcr4 positive (evrard et al., 2018) , may be released into the blood from the bone marrow prematurely and infiltrate the lung tissue in severe patients. the emergence of these populations could be a predictor of switch to a severe disease. further research will be required to determine their specific role in disease development. altogether, we observed that severe covid-19 is specifically associated with 1) a burst of circulating calprotectin that precedes cytokine release syndrome, 2) low levels of nonclassical monocytes in the peripheral blood, and 3) an emergency myelopoiesis that releases immature and dysplastic myeloid cells with an immune suppressive phenotype. calprotectin j o u r n a l p r e -p r o o f plasma level and non-classical monocytes monitoring in the blood of patients could be implemented in a routine lab to discriminate patients with early immunological signs consistent with developing more severe disease, as recently suggested (chen et al., 2020b) . finally, in addition to the network of potential drug targets recently depicted by analysis of sars-cov-2 interactions (gordon et al., 2020) , our work provides further rationale for the testing of several clinical strategies, including: blocking emergency myelopoiesis with lenzilumab (nct04351152), a recombinant anti-human gm-csf antibody (patnaik et al., 2020) ; testing the oral quinoline-3-carboxamide tasquinimod (fizazi et al., 2017) and related molecules such as abr-215757 (paquinimod) which blocks the binding of s100a9 to tlr4 and rage (kraakman et al., 2017; raquil et al., 2008) and the preclinical anti-cd33 monoclonal antibodies (walter, 2018) which may prevent the interaction of s100a9 with myeloid progenitors (eksioglu et al., 2017) . these analysis provide snapshots of the differences in innate immune cell phenotype and calprotectin plasma level between outpatients with a mild disease at the time of sampling, having no or limited clinical symptoms and not requiring ct-scan or hospitalization, and moderate to severe patients whose clinical situation requires hospitalization and, in most cases, oxygen supply. although all the statistical analyses indicate that these biomarkers efficiently discriminate these two clinical situations and may help for urgent patient triage, a prospective serial analysis is now required to evaluate how these biomarkers can predict the switch from a mild to a moderate or severe covid-19 and inform on the mechanisms involved in this switch. non-supervised umap analysis of data from 25 patients (controls=12; mild=5; critical=8); c. cell surface marker expression within umap analysis shown in panel b; d. non-supervised umap analysis of patient blood samples in control, mild and severe groups; e. percentage of neutrophils within total cells in each individual sample within indicated patient groups; f. partition of neutrophil subsets, based on cd101 and cd10 expression, within each patient group (data pooled per group); g. percentage of cd10 low cd101 +/neutrophils among total neutrophils, as in e; h. percentage of monocytes within total cells, as in e . i. partition of monocyte subsets in in each individual sample within patient groups, based on cd14 and cd16 expression (left panels) or cd11b and hla-dr expression (right panels); j. fractions of non-classical monocytes within total monocytes, as in e ; k. cd11b and hla-dr expression on classical monocytes within each patient group (data pooled per group); l. percentage of hla-dr low classical monocytes within classical monocytes, as in e; m. percentage of b, cd4 + t, cd8 + t, and nk cells within total cells, as in e; kruskal-wallis test, * p<0.05; ** p <0.01; *** p<0.001; ns, non-significant. two blood samples were collected ten days apart from 3 covid-19 patients. blood was also collected once from 3 outpatient controls whose sars-cov-2 rt-pcr was negative. individual cell mrnas were sequenced using chromium 10x technology; b. time line of sample collection in the three patients; further details in table s4 figure 2a , and violin plots of gene expression in three statistically defined clusters; b. heatmap of differentially expressed genes (degs; logfc ± 0.25; fdr<0.05) in total monocytes; columns labelled "0" identifies degs generated by comparing each patient sample at day 0 to the pool of the three controls and the two other patient samples at day 0; columns labelled "10" identifies the expression of these genes in each patient sample at day 10 compared to day 0; genes shown in table s4 . c-e. spectral flow analysis in pooled controls and each individual patient sample at day 0 and day 10 of monocyte subset partition in samples analyzed by scrnaseq (c), cd11b and cd141 expression among classical monocytes (d), and cd169 and hla-dr expression among classical monocytes (e); f-i. mass cytometry analysis of monocyte subsets in 4 patients within each group (pooled data) (f); non-classical monocyte fraction among total monocytes in each individual sample within the 3 groups (g); p65/nf-κb expression in hla-dr low classical monocyte subset as in f (h); fraction of p65/nf-κb high hla-dr low classical monocytes among classical monocytes as in g (i). figure 2a ; b. umap profile of neutrophils within the 3 controls, the mild and the two severe cases with the cluster gates overlaid; c. violin plots of indicated gene expression in two statistically defined neutrophil clusters; d. heatmap of degs in total neutrophils generated as described in figure 3b ; e-f. spectral flow analysis in pooled controls and each individual patient sample at day 0 and day 10 of neutrophil subsets, based on cd10 and cd101 expression (e), and cxcr4 and cd11b expression among cd10 low cd101neutrophils (f) in indicated samples (pooled controls). g-h. mass cytometry analysis of neutrophil subsets in 4 patients within each group (pooled data) as in figure 3f -i, based on cd10 and cd101 expression (g) and the fraction of cd10 low cd101neutrophils among total neutrophils in each sample within the 3 groups (h). plasma levels of calprotectin (s100a8/s100a9), interferon alpha (ifnα2a) and 40 cytokines and chemokines in blood samples collected from 84 patients (controls, 40; mild disease, 18; moderate or severe disease, 25). a. volcanoplot representation of cytokine levels in mild covid-19 compared to controls; ifnα2a shown in orange, b. volcanoplot representation of cytokine levels in severe covid-19 compared to control patients; ifnα2a shown in orange, calprotectin, cxcl11, cxcl13 and cx3cl1 in red being are the most significantly associated with severe forms; c. circulating levels of cxcl8, ifnα2a, calprotectin and il-6 in individual samples within each group; d. impact of comorbidities (see table s6 ) on calprotectin plasma level within each group; e. volcanoplot representation of cytokine levels in severe patients with and without comorbidities; f. impact of bacterial infections on calprotectin plasma level within each group; g-i. spearman correlations between calprotectin plasma level and neutrophil count (g), fibrinogen (h), and d-dimers (i); j-l. spearman correlations between il-6 plasma level and neutrophil count (j), fibrinogen (k), and d-dimers (l). wilcoxon rank-sum test, * p<0.05; ** p <0.01; *** p<0.001; **** p<0.0001; ns, non-significant. table s3 -s5. monocyte analysis by single cell rna sequencing, spectral flow cytometry and mass cytometry. a. pathway analysis generated by comparing degs in monocytes of each sars-cov-2 positive patient to the same population in the three control patients considered together using ipa software (mild patient in blue, severe #1 in red, severe # 2 in orange); b. the same degs were used to perform a gene ontology network analysis using cluego software, considering the two severe patients together; c. combined (left panel) and individual (right panel) mass cytometry analysis of p65/nf-κb expression in circulating cd34 + cells in each group. figure 4 ; table s3 -s5. neutrophil analysis by single cell rna sequencing, spectral flow cytometry and mass cytometry. a. heatmap of the top 20 degs defining two neutrophil clusters. b. pathway analysis generated by comparing degs in neutrophils of each sars-cov-2 patient to the same population in the three control patients considered together using ipa software (mild patient in blue, severe #1 in red, severe # 2 in orange); c. the same degs identified in neutrophils were used to perform a gene ontology network analysis using cluego software, considering the two severe patients together. rt-qpcr analysis of s100a8 and s100a9 gene expression in the three groups of patients, using hprt as a control gene; b. heatmap of cytokines, chemokines, ifnα2a and calprotectin plasma levels in 37 covid-19 patients compared to 40 controls. sars-cov-2-positive patients included 14 mild and 23 severe patients. associated bacterial infection at sample collection is indicated in purple. the heatmap shows z score-normalized concentrations, each column represents one patient and each row one protein; the color gradient from blue to red indicates increasing concentrations. rows and columns are clustered using correlation distance and average linkage; c. volcano-plot representation of cytokine levels in severe sars-cov-2 patients with (n=11) or without (n=14) bacterial infection at the time of sample collection; d. spearman correlation between calprotectin concentration and age showing control patients in green, mild in orange and severe in red; e. spearman correlation between il-6 and calprotectin concentrations (color code as in d). figure 6 . related to figure 6 ; table s6 . validation of innate immune signature of severe covid-19. a-b non-supervised umap representation generated by pooling data from all the patient samples; cell identification (a) surface marker expression (b); c. bar plots representing the percentage of cd10 low cd16 low neutrophils among all neutrophils in individual patients from each group in the validation cohort (n=90). d. spearman correlation between cd169 (siglec-1) mean fluorescence intensity (mfi) and days spent by severe patients in icu. e. spearman correlation between cd169 (siglec-1) mean fluorescence intensity (mfi) and plasma ifnα concentration; yellow, mild covid-19 patients; red, severe covid-19 patients. f.g. bar plots representing the percentage of hla-dr low classical monocytes, b cells, cd4 + and cd8 + t cells and nk cells (f) and neutrophils among cd45 + cells, cd10 low cd101neutrophils among all neutrophils and cd10 low cxcr4 + neutrophils among cd10 low cd101neutrophils (g) in individual patients from each group in the validation cohort (n=90). figure 7 ; table s6-s7. changes in innate immune cell phenotype are detected in moderate covid-19 patients. a. bar plots representing the percentage of b cells, cd4 + t cells, cd8 + t cells, nk cells, total monocytes, cd169 + , hla-dr low and cd141 + classical monocyte subsets, total neutrophils among cd45 + cells, and cd10 low cd101and cd10 low cd16 low neutrophil subset among all neutrophils in individual patients from each group, with the moderate category (6 patients) highlighted. b. plasma concentration of ifnα in moderate covid-19 patients compared to the three other groups. c. roc analysis showing performance of a diagnostic test using percentage of non-classical monocytes among total monocytes to distinguish controls and mild covid patients from moderate and severe covid patients; d. monocyte subset analysis in the peripheral blood of 2 severe patients, before (left panels) and after (right panels) treatment with the indicated anti-il-6 antibodies; e. percentage of hla-dr low classical monocytes among classical monocytes in a cohort of 22 patients and 17 controls grouped into 4 clinical categories; f. correlation between the percentage of hla-dr low classical monocytes and non-classical monocytes; g. percentage of cd16 low neutrophils among neutrophils in control and covid-19 patients of the learning cohort described in figure 7 . h. roc curves evaluating the discriminating significance of calprotectin plasma level (yellow), nonclassical monocyte fraction (red), cd16 low circulating neutrophils (blue) and ifnα2a plasma level (white) between controls/mild patients and moderate/severe patients. auc, area under the curve; mann whitney test; i. heatmap of blood and bronchoalveolar lavage fluid scrnaseq cells integrated defining the 5 regions of cell populations; j. pathway analysis (cytoscape and cluego) of degs expressed at a higher level in bronchoalveolar monocytes/macrophages from mild versus severe patients. k. umap analysis of neutrophils with s100a8 (left panel) and s100a9 (right panel) gene expression level projection (low expression = grey dots; high expression = dark blue dots). * p<0.05; ** p <0.01; *** p<0.001. lead contact. further information and request for resources and reagents should be directed to and will be fulfilled by the lead contact: florent_ginhoux@immunol.astar.edu.sg (f.gi.) patients. this non-interventional study was approved by institutional review boards of cochin-port royal (paris, france) and gustave roussy (villejuif, france) hospitals and the ethical committee of cochin-port royal hospital (clep decision n°: aaa-2020-08023), and conformed to the principles outlined in the declaration of helsinki. controls (n = 72) were symptomatic patients who were seen at hôtel-dieu or gustave roussy covid-19 screening unit and were negative for sars-cov-2 rt-pcr on pharyngeal swab. mild covid-19 patients (n = 27) were defined by having limited clinical symptoms (fever, cough, diarrhea, myalgia, anosmia/ageusia) that did not require ct-scan or hospitalization. moderate cases (n = 16) were defined as symptomatic patients with dyspnea and radiological findings of pneumonia on thoracic ct scan, requiring hospitalization and a maximum of 9 l/min of oxygen. in the larger part of this study, mild and moderate cases were analyzed together and grouped under "mild category". severe patients (n = 43) were those hospitalized in the icu with respiratory distress requiring 10l/min of oxygen or more, without or with endotracheal intubation and mechanical ventilation. sampling. whole human peripheral blood was collected into sterile vacutainer tubes containing edta or heparin. except for single cell rna sequencing, tubes were centrifuged at 300 g for 5 min at room temperature and plasma was collected. whole blood was mixed at a 1:1 ratio with whole blood cell stabilizer (cytodelics), incubated at room temperature for 10 min and transferred to -80°c freezer to await analysis. these samples were secondarily thawed in a water bath set to +37°c. cells were fixed at a ratio 1:1 with fixation buffer (cytodelics, ratio 1:1) and incubated for 10 min at room temperature. red blood cells were lysed by addition of 2 ml of lysis buffer (cytodelics, ratio 1:4) at room temperature for 10 min. white blood cells were washed with 2 ml of wash buffer (cytodelics, ratio 1:5). spectral flow cytometry. cells were resuspended in 100 µl extra-cellular antibody cocktail and incubated at room temperature for 15 min. for intra-cellular labelling, a step of permeabilization was performed using 200 µl of bd cytofix/cytoperm kit (bd); cells were then incubated for 40 min at +4°c, washed in perm buffer (bd) and resuspended in intra-cellular antibody cocktail. after incubation, cells were washed in flow cytometry buffer (1% bsa, 0.5% na-azide and 0.5m edta in pbs) and resuspended to proceed to the acquisition. all antibodies are listed in the key resource table. samples were acquired on cytek aurora flow cytometer (cytek biosciences). fcs files were exported and analyzed using flowjo software. to fully capture peripheral blood cell heterogeneity, we analyzed fresh samples without cell sorting or freezing and without ficoll enrichment, minimizing time of incubation and processing. sample preparation was done at room temperature. after red cell lysis, single-cell suspensions were loaded onto a chromium single cell chip (10x genomics) according to the manufacturer's instructions for coencapsulation with barcoded gel beads at a target capture rate of ~7000 individual cells per sample. to analyze neutrophils, we added rnase inhibitor (rnase out recombinant ribonuclease inhibitor invitrogen, 40u/ml) into the loading buffer. captured mrnas were barcoded during cdna synthesis using the chromium single cell 3' solution v3 (10x genomics) according to the manufacturer's instructions. of note, we increased the pcr cycles by two during cdna amplification. all samples (at day 0 and day 10) were processed simultaneously with the chromium controller (10x genomics) and the resulting libraries were prepared in parallel in a single batch. we pooled all of the libraries for sequencing in a single sp illumina flow cell. all of the libraries were sequenced with an 8-base index read, a 28-base read1 containing cell-identifying barcodes and unique molecular identifiers (umis), and a 91-base read2 containing transcript sequences on an illumina novaseq 6000. reads were aligned to the hg19 genome and were used for subsequent analysis. using the package seurat v3 (stuart et al., 2019) , we normalized and scaled scrna sequencing data. we next applied a principle component analysis to the scrna sequencing results yielding a number of significant pcs (using jackstraw plot analysis). in addition, the standard deviation differences from one pc to another was taken into account as described by the seurat v3 manual (struart et al., 2019) . to generate umap plots, min_distance was set as 0.3 and n_neighbors was set to 30. by dimensionality reduction, distinct clusters were identified and described by performing the findclusters feature. the resolution of this feature was reduced to 0.3 to identify main cellular population only. following this, differential genes were identified by performing the findallmarkers function and selecting genes that were differentially expressed (logfc >/= +/-0.25 and fdr < 0.05). this approach identified a number of well characterized blood cell populations. clustering and analysis of specific cell populations were performed in a similar manner to as previously stated. cells were clustered and separated based on well described markers (cd14/cd16 as describing monocyte populations). the bronchoalveolar dataset was downloaded from the nih geo database (liao et al., dataset gse145926) and integrated with our own blood scrnaseq data using the seurat v3 anchoring method (stuart et al., 2019) . briefly, the datasets were normalized independently and the highly variable genes were identified for each dataset using the seurat pipeline. a corrected data matrix with both datasets was then generated using the seurat v3 anchoring procedure to allow for joint analysis. the matrix was scaled and a principal component analysis (pca) was performed using the seurat v3 pipeline. a umap was performed on the 30 first principal components (pcs) . these principle components and subsequent clustering and analysis of scrna sequencing data was performed as previously described. comparisons between patient samples were performed by a variation of the findmarkers function that compared the differentially expressed genes from different samples, patient groups, and organs. cutoff values were determined as previously described. rt-qpcr analysis. total rna was extracted with rneasy mini kit (qiagen) and reverse transcribed with superscript™ iv vilo™ master mix with ezdnase™ enzyme (invitrogen). real-time quantitative polymerase chain reaction (rt-qpcr) was performed using power sybr™ green pcr master mix in a biorad cfx96 thermocycler using the standard sybr green detection protocol as outlined by the manufacturer (applied biosystems). briefly, 12 ng of total cdna, 50nm (each) primers and 1× sybr green mixture were used in a total volume of 20 μl. human primer sequences are the following: gus (f: gaaaatatgtggttggagagctcatt; r: ccgagtgaagatcccct tttta); hprt (f: ggacaggactgaacgtcttgc; r: cttgagcacacagagggctaca); s100a8 (f: caacactg atggtgcagttaacttc; r: ctgccacgcccatctttatc); s100a9 (f: ctgagcttcgagg agttcatca; r: cgtcaccctcgtgcatcttc). table 6 ) were centrifuged for 15 min at 1,000 g, diluted 1:4, then monitored using the bio-plex pro tm human chemokine panel 40-plex assay (bio-rad, ref: 171ak99mr2) according to the manufacturer's instructions. 40-plex cytokines and chemokines provided are: ccl1, ccl11, ccl13, ccl15, ccl17, ccl19, ccl2, ccl20, ccl21, ccl22, ccl23, ccl24, ccl25, ccl26, ccl27, ccl3, ccl7, ccl8, cx3cl1, cxcl1, cxcl10, cxcl11, cxcl12, cxcl13, cxcl16, cxcl2, cxcl5, cxcl6, cxcl8, cxcl9, gm-csf, ifnα, il-10, il-16, il-1b, il-2, il-4, il-6, mif, tnfa. acquisitions and analyses were performed on a bio-plex 200 system (bio-rad) and a bio-plex manager 6.1 software (bio-rad), respectively. soluble calprotectin (diluted 1:100) and ifnα2a were analyzed using a r-plex human calprotectin antibody set (meso scale discovery, ref: f21yb-3) and the ultra-sensitive assay s-plex human ifnα2a kit (meso scale discovery, ref: k151p3s-1), respectively, following manufacturer's instructions. acquisitions and analyses of soluble calprotectin and ifnα were performed on a meso™ quickplex sq120 reader and the msd's discovery workbench 4.0. each plasma sample was assayed twice with the average value taken as the final result. data representation was performed with software r v3.3.3 using tidyverse, dplyr, ggplot2, ggpubr, pheatmap, corrplot or hmisc packages. mass cytometry. cells were barcoded using the 20-plex pd barcoding kit (fluidigm). briefly, they were washed in barcode perm buffer, resuspended in 800 µl of barcode perm buffer and 100 µl of each barcode were transferred to the appropriate sample. cell suspensions were incubated for 30 min at room temperature, washed twice with cell staining buffer (fluidigm) and pooled, suspended in 100 µl filtered antibody cocktail, and incubated for 30 min at +4°c. all antibodies used are listed in key resource table. after staining, cells were washed with cell staining buffer and permeabilized with 200 µl of fix/perm from foxp3/transcription factor staining buffer kit (ebiosciences), 40 min at +4°c. after incubation, cells were washed in perm buffer from foxp3/transcription factor staining buffer kit (ebiosciences), resuspended in 100 µl filtered antibody cocktail, incubated for 30 min at +4°c, washed in cell staining buffer and resuspended in 50 µl of cytofix/perm for 5 min at room temperature. then, 400 µl of pbs containing 1.6% pfa + iridium (1:4000) were added for 35 min at room temperature. finally, cells were washed in cell staining buffer, resuspended in 50 µl and stored at +4°c until acquisition. cells were counted, washed and resuspended in maxpar cell acquisition solution at 0.5x 10 6 / ml and mixed with 10% eq beads immediately before acquisition on helios mass cytometer using noise reduction, event length limits of 10-150 pushes. an average of 500,000 events were acquired per sample at a flow rate of 0.03ml/min. mass cytometry standard files were normalized to a global standard determined for each log of eq beads using cytof software v. 6.7.1014 (fluidigm) . fcs files were exported and analysed using flowjo software. umap was performed with n_neighbours of 15 and a min_distance of 0.2. clusters were identified by the detection of commonly used cell markers (t cells expressing cd4 or cd8, neutrophils expressing cd15, and monocytes expressing cd14 and or cd16). whole-blood samples (200µl) were labelled with anti-cd14-pc7 (clone rmo52); cd16-pb (clone 3g8); cd2-fitc (clone 39c1.5); cd56-pc5.5 (clone n901); cd24-pe (clone alb9); cd45-ko (clone j33) and hla-dr-apc (clone immu-357) antibodies, all purchased from beckman-coulter. red blood cells were lysed with 1 ml versalysetm (beckman coulter) before sample analysis with a navios cytometer (beckman coulter) as described (tarfi et al., 2019) . monocytes were selected as cd45 high /ssc int cells among living cells and singlets before excluding t cells as cd2 + /ssc low , nk cells as cd56 + /ssc low/int , b cells as cd24 + /ssc low , immature and mature granulocytes as cd24 + /ssc int/high , cd16 bright residual granulocytes, and remaining cd14 − cd16 − cells corresponding mainly to basophils and nk cells not previously excluded. monocyte subsets were detected on a cd45/scc dot plot, using a cd14/cd16 scattergram that separates cd14 high cd16 low (classical), cd14 high /cd16 high (intermediate) and cd14 low cd16 high (nonclassical) subsets. finally, the proportion of monocytes hla-dr low was evaluated on a hla-dr/cd14 scattergram. data analysis. calculations and statistical tests were performed using r v3.3.3. unless stated, p-values are two-sided with 95% confidence intervals for the reported statistic of interest. individual data points representing the measurement from one patient are systematically calculated from the corresponding distribution. wilcoxon rank-sum test was applied to assess differences in concentration between two different groups. when indicated, the false discovery rate (fdr, p > 0.05) was controlled using the benjamini-hochberg procedure. spearman correlations were computed using hmisc r package and cytokine results were shown using r package pheatmap. soluble factor fold ratios were calculated as log2 transformation of values of mild and severe patients versus median value of all control patients, and were converted to z scores. hierarchical clustering of the patients based on the z score of 42 soluble factors was performed using euclidean distance and ward.d clustering. gene ontology networks were made by subjecting the degs from previous scrna sequencing analysis to the cytoscape addon cluego. the selected degs were specific to those with increased expression by monocytes and neutrophils from mild or severe sars-cov-2 positive patients. biological process gene ontologies selected had an fdr < 0.05. other statistical analyses were performed using graphpad prism 7. a generalized linear model was also used to analyze interactions between biological parameters. first, neutrophil count, calprotectin, fibrinogen, il-6 and d-dimers were normalized using log transformation. then, calprotectin plasma level was modeled using multivariable linear regression adjusted for the other parameters, and their interaction with the groups. similar approach was performed to model il-6. backward selection was applied to obtain a parsimonious model. to identify the most discriminant markers, we used a logistic regression adjusted for the scaled log2-transformed markers. parameters were penalized using the least absolute shrinkage and selection operator (lasso) to limit overfitting due to the high number of markers. the regularization parameter was selected from 10 folds cross-validation using the glmnet r package (friedman et al., 2010) . the final auc estimate was corrected for optimism using the harrell's method (harrell et al., 1996) , and its confidence interval was computed using the two-stage approach proposed by noma et al (noma et al., 2020) with 2000 bootstrap samples for each stage. in this analysis, we included age, sex and comorbidities together with biological parameters. given the absence of validation cohort, auc was corrected to limit overfitting bias. this correction indicated an auc at 99.7% (95% confidence interval [98.8%; 100.0%]). the final score corresponds to the following equation: gustave roussy cancer campus gustave roussy cancer campus gustave roussy cancer campus gustave roussy cancer campus gustave roussy cancer campus hôpital hôtel-dieu, 75014 service de médecine intensive et réanimation gustave roussy cancer campus gustave roussy cancer campus service de réanimation médicale, gustave roussy cancer campus activated neutrophils exert myeloid-derived suppressor cell activity damaging t cells beyond repair middle east respiratory syndrome corona virus spike glycoprotein suppresses macrophage responses via dpp4-mediated induction of irak-m and ppargamma alarmins mrp8 and mrp14 induce stress tolerance in phagocytes under sterile inflammatory conditions the nlrp3 inflammasome functions as a driver of the myelodysplastic syndrome phenotype dimensionality reduction for visualizing single-cell data using umap host-viral infection maps reveal signatures of severe covid-19 patients ifn-i response timing relative to virus replication determines mers coronavirus infection outcomes clinical and immunological features of severe and moderate coronavirus disease 2019 elevated serum levels of s100a8/a9 and hmgb1 at hospital admission are correlated with inferior clinical outcomes in covid-19 patients epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study the architecture of a scrambled genome reveals massive levels of genomic rearrangement during development induction of myelodysplasia by myeloid-derived suppressor cells comparative replication and immune activation profiles of sars-cov-2 and sars-cov in human lungs: an ex vivo study with implications for the pathogenesis of covid-19 activation of human monocytes after infection by human coronavirus 229e star: ultrafast universal rna-seq aligner novel therapeutic approach to improve hematopoiesis in low risk mds by targeting mdscs with the fc-engineered cd33 antibody bi 836858 developmental analysis of bone marrow neutrophils reveals populations specialized in expansion, trafficking, and effector functions transcriptome assessment reveals a dominant role for tlr4 in the activation of human monocytes by the alarmin mrp8 a randomized, double-blind, placebo-controlled phase ii study of maintenance therapy with tasquinimod in patients with metastatic castration-resistant prostate cancer responsive to or stabilized during first-line docetaxel chemotherapy signaling mechanisms inducing hyporesponsiveness of phagocytes during systemic inflammation chronic inflammation in the etiology of disease across the life span complex immune dysregulation in covid-19 patients with severe respiratory failure a sars-cov-2 protein interaction map reveals targets for drug repurposing clinical characteristics of covid-19 in china. reply developmental and functional heterogeneity of monocytes impaired type i interferon activity and exacerbated inflammatory responses in severe covid-19 patients. medrxiv receiver operating characteristic (roc) curve analysis for medical diagnostic test evaluation the transcription factor nr4a1 (nur77) controls bone marrow differentiation and the survival of ly6c-monocytes slan-defined subsets of cd16-positive monocytes: impact of granulomatous inflammation and m-csf receptor mutation clinical features of patients infected with 2019 novel coronavirus in wuhan neutrophil-derived s100 calcium-binding proteins a8/a9 promote reticulated thrombocytosis and atherogenesis in diabetes causal analysis approaches in ingenuity pathway analysis monocyte conversion during inflammation and injury high levels of s100a8/a9 proteins aggravate ventilator-induced lung injury via tlr4 signaling the lung is a site of platelet biogenesis and a reservoir for haematopoietic progenitors sars-cov-2 and viral sepsis: observations and hypotheses single-cell landscape of bronchoalveolar immune cells in patients with covid-19 high incidence of venous thromboembolic events in anticoagulated severe covid-19 patients monocytic hla-dr expression in intensive care patients: interest for prognosis and secondary infection prediction identification of monocyte-like precursors of granulocytes in cancer as a mechanism for accumulation of pmn-mdscs tocilizumab, an anti-il6 receptor antibody heterogeneity of neutrophils mechanisms for the transendothelial migration of hiv-1-infected monocytes into brain phase 1 study of lenzilumab, a recombinant anti-human gm-csf antibody, for chronic myelomonocytic leukemia (cmml) blockade of antimicrobial proteins s100a8 and s100a9 inhibits phagocyte migration to the alveoli in streptococcal pneumonia presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with covid-19 in the new york city area induction of nuclear factor-kappab responses by the s100a9 protein is toll-like receptor-4-dependent a mir-150/tet3 pathway regulates the generation of mouse and human non-classical monocyte subset cesarean section and chronic immune disorders cytoscape: a software environment for integrated models of biomolecular interaction networks human s100a9 potentiates il-8 production in response to gm-csf or fmlp via activation of a different set of transcription factors in neutrophils human cytomegalovirus induces monocyte differentiation and migration as a strategy for dissemination and persistence comprehensive integration of single-cell data disappearance of slan-positive non-classical monocytes for diagnosis of chronic myelomonocytic leukemia with associated inflammatory state role of leptin and socs3 in inhibiting the type i interferon response during obesity breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid-19 sars coronavirus pathogenesis: host innate immune responses and viral antagonism of interferon s100-alarmin-induced innate immune programming protects newborn infants from sepsis advancing scientific knowledge in times of pandemics myeloid-derived suppressor cells coming of age autoinhibitory regulation of s100a8/s100a9 alarmin activity locally restricts sterile inflammation investigational cd33-targeted therapeutics for acute myeloid leukemia s100a8/a9 in inflammation. front immunol 9 comorbidities and multi-organ injuries in the treatment of covid-19 a single-cell atlas of the peripheral immune response in patients with severe covid-19 middle east respiratory syndrome coronavirus orf4b protein inhibits type i interferon production through both cytoplasmic and nuclear targets sars-coronavirus replication in human peripheral monocytes/macrophages our teams are supported by grants from ligue nationale contre le key: cord-253302-keh7s758 authors: gong, danyang; dai, xinghong; jih, jonathan; liu, yun-tao; bi, guo-qiang; sun, ren; zhou, z. hong title: dna-packing portal and capsid-associated tegument complexes in the tumor herpesvirus kshv date: 2019-09-05 journal: cell doi: 10.1016/j.cell.2019.07.035 sha: doc_id: 253302 cord_uid: keh7s758 assembly of kaposi’s sarcoma-associated herpesvirus (kshv) begins at a bacteriophage-like portal complex that nucleates formation of an icosahedral capsid with capsid-associated tegument complexes (catcs) and facilitates translocation of an ∼150-kb dsdna genome, followed by acquisition of a pleomorphic tegument and envelope. because of deviation from icosahedral symmetry, kshv portal and tegument structures have largely been obscured in previous studies. using symmetry-relaxed cryo-em, we determined the in situ structure of the kshv portal and its interactions with surrounding capsid proteins, catcs, and the terminal end of kshv’s dsdna genome. our atomic models of the portal and capsid/catc, together with visualization of catcs’ variable occupancy and alternate orientation of catc-interacting vertex triplexes, suggest a mechanism whereby the portal orchestrates procapsid formation and asymmetric long-range determination of catc attachment during dna packaging prior to pleomorphic tegumentation/envelopment. structure-based mutageneses confirm that a triplex deep binding groove for catcs is a hotspot that holds promise for antiviral development. resolution of kshv's asymmetric icosahedral structure is achieved via symmetry-relaxed cryo-em using sequential localized classification. interactions between the dnatranslocating portal protein, associated capsid and tegument proteins, and the viral genome reveal surprising variability in tegument protein occupancy and orientation plasticity. first discovered in 1994 associated with tumor lesions in aids patients in los angeles (chang et al., 1994) , kaposi's sarcoma-associated herpesvirus (kshv) has since been shown to cause endemic cancers in sub-saharan africa, the greater mediterranean, and the xinjiang region of china (ganem, 2010; giffin and damania, 2014) . kshv is a member of the herpesvirus subfamily gammaherpesvirinae, which also includes epstein-barr virus (ebv), the first identified human oncovirus. like all herpesviruses, assembly of an infectious kshv virion starts at a bacte-riophage-like portal complex that putatively nucleates the formation of a t = 16 icosahedral capsid, which, at maturation, is composed of major capsid protein (mcp), small capsid protein (scp), ab 2 heterotrimers of the tri1 monomer and tri2 dimer and is decorated by capsid-associated tegument complexes (catcs) (cardone et al., 2012) . upon establishment of an initial procapsid, the portal facilitates the translocation of kshv's $150-kb genome. this key process involves recruitment of an atp-driven terminase (yang et al., 2007; heming et al., 2017) to the unique portal vertex to recognize, package, and cleave concatemeric viral double-stranded (ds)dna, which, in conjunction with catc's critical supporting roles (heming et al., 2017) , give rise to viable genome-containing nucleocapsids (adelman et al., 2001; beard et al., 2002) . unlike the comparatively high occupancies of capsid-associated tegument proteins in alphaherpesviruses wang et al., 2018) and betaherpesviruses yu et al., 2017) , kshv catc binding sites are markedly partially and/or more flexibly occupied, leading to poorly resolved catc structures in prior icosahedral reconstructions of kshv (dai et al., 2014 . the catc nonetheless plays a critical role in the release of a pleomorphic virion because the recruitment of outer tegument proteins and a glycoprotein-sporting envelope leading to virion egress depend on interactions of various viral proteins with constituents of the catc (owen et al., 2015; sathish et al., 2012) . in the absence of an in situ kshv portal and catc structures, how a single portal protein (porf43) orchestrates the rise of a robust capsid and how order is maintained in the subsequently complex and variable processes of tegumentation and envelopment remains unknown. an invaluable body of pioneering work has been accomplished so far on the prototypical herpesvirus herpes simplex virus type 1 (hsv-1) portal, including microscopy studies confirming its localization at a capsid vertex (cardone et al., 2007) , its dodecameric stoichiometry (rochat et al., 2011) , and, more recently, the impressive identification and reconstruction of the 5-fold vertex region surrounding the portal (mcelwee et al., 2018) . here we present the first atomic structures of a gammaherpesvirus portal vertex in kshv, which allowed us to use structure-guided mutageneses to identify a viable drug target. simultaneously, our method of symmetry relaxation and sequential localized classification enabled us to dissect a heretofore puzzling schema of variable catc occupancy at capsid vertices, providing insights into portal-seeded capsid assembly and how structural plasticity arises from a well-defined and highly ordered capsid. to determine high-resolution structures of kshv's unique portal vertex, we imaged frozen-hydrated kshv virions, obtaining 44,328 virion particle images. prior kshv capsid reconstructions, although informative, have been calculated with icosahedral symmetry applied, obscuring non-icosahedrally related structures trus et al., 2001; wu et al., 2000) . here we developed a workflow (figures s1 and s2) to determine structures of the non-icosahedrally arranged components. this sub-particle data processing procedure, employing sequential localized classification with symmetry relaxation, allows stepwise reconstruction of selected regions of a viral particle. briefly, two rounds of sub-particle classification were performed to relax icosahedral and local symmetries. in the first round, 12 ''subparticles'' of capsid vertices were extracted from each virion image using coordinates calculated from an initial icosahedral reconstruction of the virion particle. we then performed 3d classification with 5-fold symmetry applied to sort out the unique portal vertex sub-particle from the 11 penton vertex sub-particles for each virion image. subsequent refinement yielded a 4.3-å resolution reconstruction of the portal vertex with 5-fold (c5) symmetry (first visualized in hsv-1; mcelwee et al., 2018) , revealing high-resolution structures of the capsid components surrounding the portal (table 1 ; figures s1, s2a, and s2d). a second round of sub-particle classification further relaxed c5 symmetry at the portal vertex, and subsequent refinement with c12 symmetry resulted in a 4.7-å reconstruction, revealing high-resolution features of the dodecameric portal (table 1; figures s1, s2a, and s2e). using a specific orientation determined in our c12 classification, we then calculated an asymmetric (c1) reconstruction of the entire dsdna-containing capsid at 7.6 å (table 1 ; figures 1a, s1, s2b, and s2g; video s1) and a c1 reconstruction of the portal vertex at 5.2 å (table 1 ; figures 1b, s1, s2a, and s2f). our c1 structures reveal concentrically packed layers of dna with an inter-duplex distance of $25 å , quasi-5-fold-organized capsid and tegument densities, a quasi-12-fold-symmetric portal dodecamer, and asymmetric terminal dna within a dna translocation channel capped by a distinctive density visible at lower thresholds (figures 1a and 1b) . this stepped implementation of symmetry expansion and sub-particle classification thus proved effective in teasing apart the multiple symmetry mismatches present in herpesvirus capsids. structure of the dna-translocating portal using our c12 portal reconstruction, we atomically modeled kshv's 605-amino acid (aa) portal protein porf43 (table 1; figure 1c; video s2) . despite a lack of sequence homology, our porf43 model showed striking similarities with structures of phage portal proteins in domain organization and topology (lebedev et al., 2007; lokareddy et al., 2017; sun et al., 2015) . we thus named the five domains of our porf43 model in a fashion analogous to phage portal proteins: wing (aa 9-45 and 126-245), wall (aa 46-125 and 496-553), stem (aa 246-272 and 454-477), clip (aa 273-453 [aa 281-412 unmodeled] ), and b-hairpin (aa 478-495) ( figure 1d ). the n-terminal eight residues (aa 1-8) and c-terminal 52 residues (aa 554-605) were disordered and also not modeled. also similar to phages, 12 copies of porf43 arrange in a vaguely flying saucer-like ring (lebedev et al., 2007; lokareddy et al., 2017; sun et al., 2015) . in agreement with previous tomographic (cardone et al., 2007; chang et al., 2007) and intermediate-resolution (mcelwee et al., 2018; rochat et al., 2011) visualizations of herpesvirus portals, we observe porf43 portal docking in the capsid through interactions between the portal's wing domains and the surrounding mcp floor and, additionally, between the base of the portal clip and the tri1 subunit of periportal triplexes ( figure 1b ). clip, stem, b-hairpin, and wall domains largely form the interior of the portal channel (i.e., dna translocation channel) through which dna is threaded into the capsid during genome packaging ( figures 1d and 1e ). of the channel-lining domains, b-hairpin and clip domains form the most constricted regions of the channel at $28 å and $32 å in diameter, respectively ( figures 1d and 1e) . notably, both regions contain characteristic b sheets and interact with dna in our mature virion state. arranged radially, 12 b-hairpins comprise an apertured disk perpendicular to the portal channel axis ( figure 1e ). interestingly, residues that line the channel are markedly positively charged above this aperture, likely facilitating interactions with negatively charged dna ( figure 1f ). twelve sets of three-stranded b sheets roughly parallel to the channel axis form the base of the clip region ( figure 1g ). each b sheet is the result of a 2+1 augmentation motif, where two parallel clip b strands from each porf43 subunit are augmented in anti-parallel fashion by a single clip b strand from its clockwise neighbor (when viewing toward the capsid interior) ( figure 1g , insets); this results in a ''daisy-chained'' ring structure, conceivably ideal for propagating and coordinating conformation changes among subunits in the dodecameric complex during dsdna translocation. in contrast to phage portals, a turret-like density arises from our kshv clip, extending distally toward the portal-capping density ( figures 1b and 1g ). although we were unable to model this clip turret, we identified what appeared to be helical structures comprising the turret walls ( figure 1g ), consistent with porf43 secondary structure and disorder predictions, which show strong helix propensity in this region (aa 281-412) (figure s3a) . during dna packaging, the distal end of this turret is the putative docking site of terminase (yang et al., 2007; heming et al., 2017) . the lower resolution of the clip turret is thus likely a result of inherent plasticity in the structure to accommodate interactions with various partners during different stages of viral assembly (e.g., terminase during active dna packaging and the portal cap after packaging). a portal-effected global distribution of catc our c1 portal vertex reconstruction revealed tegument densities sitting atop the periportal triplexes ta and tc (figures 1b and s2f) . the morphology of these densities-a helix bundle supported by a triplex-bridging base-resemble that of catcs (c) porf43 model, shown as rainbow-colored ribbons (blue, n terminus / red, c terminus). inset depictions are ribbon-and-stick in c12 mesh density. (legend continued on next page) identified around penton vertices in previous icosahedral reconstructions of kshv (dai et al., 2014) and of neurotropic hsv-1 and hsv-2 . however, peripenton catc densities in the kshv icosahedral reconstruction were distinctively weaker than surrounding capsid proteins and only discernible when low-pass-filtered to $6-å , suggesting low catc occupancy and/or flexibility (dai et al., 2014) ; in contrast, the hsv-1 icosahedral reconstruction indicated full catc occupancy . interestingly, our c1 reconstruction of the kshv portal vertex reveals the presence of five catc densities with a strength comparable with that of underlying capsid elements, indicating full occupancy of the five catc registers surrounding the portal vertex. this preferred association of catc with the portal vertex over the remaining 11 (penton) vertices suggests an important role of catc at the portal vertex, to be discussed later. our c1 reconstruction of the genome-containing capsid also indicates a curious binding pattern of catcs to penton vertices in a manner dictated by the portal vertex. when displayed at a density threshold appropriate for capsid proteins, our c1 capsid reconstruction shows two adjacent (i.e., ortho) catcs bound to each portal-proximal penton vertex, one catc to each portaldistal penton vertex, and no visible catcs at the portal-opposite penton vertex (figure 2a ). incrementally decreasing the density display threshold reveals additional catcs of progressively weaker density (indicating progressively lower occupancies at those registers) manifesting at penton vertices in a ''portal-outward'' manner ( figures 2b and s4 ). importantly, within each penton vertex, catcs appear to selectively bind registers on the portal side of a hypothetical ''equatorial'' bisecting the penton vertex, with a clear preference for the most portal-side register ( figure s4 ). adherence to this ''portal-side equatorial rule'' is exemplified by the observation that portal-proximal penton vertices (with two registers portal-side of the equatorial) max out at two catc copies as the density display threshold is decreased, whereas portal-distal penton vertices (with three portal-side registers) max out at three catc copies. portalopposite vertices can bind the catc at any register (i.e., no preferred register), up to five catcs in total, because every register is portal-side ( figures 2b and s4) . given that catc occupancy varies among vertices in a nonrandom fashion, the existence of catcs of varying densities indicates that catc occupancy varies even among capsids and is thus not fully determined in our c1 reconstruction (i.e., the averaging of capsids with differing occupancies/binding patterns inherently obscures information on specific occupancy). thus, to accurately assess the specific occupancy of penton vertex catcs and to understand the structural basis of a catc's discriminatory association with portal and penton vertices, we relaxed 5-fold symmetry for penton vertex sub-particles and performed 3d focused classification of their catc-binding registers ( figure s1 ). using a mask encompassing the region surrounding one catc, four resulting classes were obtained. although three classes clearly lacked catc densities, one class contained a catc density of a quality comparable with surrounding capsid protein densities ( figure s1 ). 37.9% of masked sub-particles (i.e., 37.9% of penton vertex registers) were assigned to this catc-binding class, slightly higher than the $30% occupancy estimated in our previous kshv icosahedral study (dai et al., 2014) . we further distinguished between possible catc binding occupancies and permutations at the five registers of a single penton vertex using geometry-based sub-particle classification. in all, eight possible permutations of zero to five catcs can bind a penton vertex, all of which were observed and reconstructed in our analyses ( figure 3a ). in agreement with trends observed in our c1 capsid reconstruction, penton vertices with two adjacent catcs bound were most abundant (ortho-catc-binding, at 38.6% of penton vertex sub-particles), followed by penton vertices with a single catc bound (one-catc-binding, at 21.9% of sub-particles). five-catc-binding penton vertices were rarest, as expected, at 0.2% of sub-particles because, barring deviations to the c1 capsid-observed consensus binding pattern, these should be limited to the portal-opposite vertices of highly bound capsids. finally, we tallied the total number of penton catcs in each capsid from our classified penton vertex sub-particles. the resulting histogram follows a sharp, slightly left-skewed gaussian distribution peaking at approximately 23 penton catcs per capsid ( figure 3b ). intriguingly, additional catc binding falls off sharply after capsids have bound 30 penton catcs, which happens to be the theoretical maximum of a capsid with ''full'' penton vertex catc binding in compliance with the portal-side equatorial rule. in all, these results-the full occupancy of portal catc registers, the portal vertex-referenced directional binding of catc within penton vertices, and the portal-dictated maximum of allowed binding registersstrongly suggest that the nucleating portal effects long-range (allosteric) structural influence on the penton vertices of each capsid. further 3d refinement of classes obtained from our masked classification of penton vertex sub-particles yielded catc-binding and catc-absent reconstructions of penton vertex registers at 3.8 å and 3.7 å , respectively (table 1 ; figures s1, s2c, s2h, and s2i). from the catc binding reconstruction, we identified (d) two opposing porf43 portal subunits, colored by domains assigned according to dsdna phage portal structures (lebedev et al., 2007; lokareddy et al., 2017; sun et al., 2015) , shown superimposed with gaussian-filtered c1 portal vertex density. (e) clip (top) and b-hairpin (bottom) slices define two narrow constrictions within the dna translocation channel. (f) electrostatic surface potential rendering of the portal's dna translocation channel. (g) c12 density shaded by the corresponding porf43 domain (gaussian-filtered c1 density shown for lower-resolution clip turret). mint green cylinders represent helix-like structures observed to extend from the clip in c1 reconstructions as in (b) but disordered in the c12 density. three consecutive porf43 subunits illustrate the daisy-chained 2+1 b sheet augmentation facilitated by the modeled clip's three b strands (insets). unmodeled turret densities extend from the middle ring of b strands (dotted black circles). see also figure s3 and videos s1 and s2. and modeled the constituents of catc, which include two copies of porf19, two copies of porf64, and one copy of porf32 (table 1; figures 4a, 4b, and s5a) ; this is in one-toone correspondence with pul25, pul36, and pul17, respectively, of hsv-1 catc . comparing reconstructions of the portal vertex and a catcbinding penton vertex reveals nearly identical structures of their associated catcs ( figures 4c and 4d ). at both vertices, porf32 bridges the triplexes ta and tc and supports a fourmembered helix bundle composed of the n-terminal segments of two porf19 subunits (aa 62-104) and the very c-terminal segments of two porf64 subunits (aa 2,596-2,635) ( figures 4b-4d ). obvious differences aside-e.g., a portal in the portal vertex, an scp-decorated penton in the penton vertex, or catc occupancies-the two vertices also differ in the presence of a disk-like portal cap at the distal ends of the portal vertex's catc helix bundles ( figure 4c , magenta) versus a globular density cocked to the right of the penton vertex's catc helix bundle ( figure 4d , magenta). because this globular density's size and proximity to catc ( figure 4e ) are reminiscent of the flexibly tethered head domain of pul25 in hsv-1 (a homolog of kshv porf19), we interpret this density as the head domain of one of the porf19 copies in the catc. indeed, the crystal structure of an hsv-1 pul25 head domain (pdb: 2f5u) (bowman et al., 2006) satisfactorily fits into the globular density ( figure 4f ), lending support to its assignment as porf19. we then proceeded to construct a homology model of the porf19 head domain (aa 127-546) using pdb: 2f5u as a template (figures 4e and 4f) . from the homology model, we estimate that aa 457-468 map to a finger-like density that inserts between an adjacent penton mcp and scp, serving as the sole interacting residues between catc and the penton protrusion ( figure 4e ). of note, because the globular density accommodates only one copy of porf19/pul25 head domain (whereas the corresponding region in hsv-1 catc accommodates two), we previously asserted-based on a 6-å icosahedral reconstructionthat the kshv catc had a different stoichiometry than the hsv-1 catc (dai et al., 2014) . but, as noted in a subsequent study demonstrating differences in globular head domain arrangement between kshv and neurotropic alphaherpesviruses (dai et al., 2014; liu et al., 2017) , and as our present sub-particle reconstructions reveal, the kshv catc does, in fact, bear the same stoichiometry/architecture as the hsv-1 catc. these findings indicate that the second head domain of porf19 is present but perhaps flexibly tethered elsewhere. indeed, a second globular density gradually appears to the left of the penton vertex's catc helix bundle at lower resolutions/ density thresholds (figures 4d, green circle, s5b, and s5c). in the kshv portal vertex, secondary structures in the portal cap beyond the catc helix bundles are not resolved as in the penton vertex's globular density. nonetheless, several lines of evidence support interpretation of the portal cap as porf19. first, catc helix bundles clearly connect with the portal cap at lower resolutions/density thresholds ( figure 4c , black circles). (legend continued on next page) second, the portal cap density is about the size of five porf19 head domains and further exhibits five weaker globular densities attached at its periphery ( figure 4c , green circle), analogous to the globular density assigned to the second porf19 head domain at the penton vertex (figures 4d, green circle, s5b, and s5c). third, functional data indicate that hsv-1 pul25 (porf19's homolog), but not hsv-1 pul36 (porf64's homolog), is critical for viral genome encapsidation, especially near the termination of genome packaging (mcnab et al., 1998; ogasawara et al., 2001) . moreover, pul25 plays an important role in docking the incoming capsid at host cell nuclear pores and gating the release of the viral genome (huffman et al., 2017; pasdeloup et al., 2009) . it is therefore plausible that porf19 could be intimately associated with the portal channel, and the portal cap is certainly poised in a prime position to execute porf19's roles in regulating the retention and release of the viral genome. despite similar architecture, kshv and hsv-1 catcs bind their underlying triplexes in somewhat different fashions. in both viruses, the porf32-equivalent catc subunit (pul17 in hsv-1) functions as a structural framework facilitating capsid association of the other four subunits that make up the catc helix bundle. side-by-side comparison of porf32 and pul17 models reveals similar core structures of separate n-and c-terminal b sheet-rich domains positioned beneath a central a helix-rich arch ( figures 4g and 4h , orange). however, several prominent features visible in pul17 are missing in porf32, accounting for porf32's 249 fewer residues: a ''hump,'' characterized by four short helices at the top of the arch that constrain and orient catc's helix bundle in hsv-1; an extended helix; and a short helix bundle at the vertex-proximal end of pul17 . notably, pul17's short helix bundle makes direct contact with the underlying triplex ta, almost solely mediating catc-ta binding in hsv-1. in porf32, nearly the entirety of this region is disordered and/or invisible in both portal and penton vertex reconstructions save for a 13-aa ''anchoring loop'' (al; aa 220-232) ( figure 4g , yellow). although porf32's 93 unmodeled residues in this region may potentially account for the critical ta-interacting helix bundle, a structure-based sequence alignment of porf32 with pul17 suggests that the ta-binding short helix bundle in pul17 is missing, even in the porf32 sequence ( figure s3b ). the importance of porf32's al for ta binding becomes apparent in light of an alternate morphology of catc-decorated kshv triplexes not observed at the catc-decorated penton vertices of hsv-1. a comparison of our models of kshv triplexes with and without the catc (table 1 ; figures 4i and 4j ) reveals that all ta triplexes underlying catcs experience an $120 counterclockwise axial rotation relative to canonical undecorated triplex ta (cf. figures 4i and 4j ). our models suggest that this rotation may be due to the steric specificity of porf32's ta-binding al. specifically, a 120 counterclockwise rotation of ta exposes a deep groove running between the molecular boundaries of tri1 and the tri2a/b dimer that facilitates docking of porf32's al and, thus, catc. strikingly, despite catcdecorated ta's large degree of apical (i.e., main body) rotation, our structures show that tri1's n-anchor-the characteristic n terminus of tri1 subunits that penetrates the capsid floor-maintains a conserved orientation in the capsid interior at penton vertices regardless of ta's apical orientation (and, by extension, regardless of catc decoration) ( figures 4i and 4j , respective insets). (periportal ta triplexes, which are all catc-decorated, exhibit 120 counterclockwise apical rotation, but their tri1 n-anchors are disordered and appear to adopt a unique configuration.) in the context of capsid assembly, our observations allow several structure-based inferences. first, the presence of a uniform tri1 n-anchor orientation (barring periportal triplexes) supports the notion that initial triplex attachment to the procapsid, putatively through the tri1 n-anchor, precedes the stage of catc binding (reviewed in heming et al., 2017) . second, whatever the means by which the portal effects triplex orientation and, thus, catc binding (or perhaps vice versa) presumably occurs at a stage of procapsid maturation before triplexes are fully ''stapled'' to the mcp floor by main-body interactions . this is predicated upon the necessity for triplex ta to have sufficient rotational freedom to permit adoption of a catc-bound orientation. third, for reasons one and two, catc binding occupancy is highly unlikely to be predetermined but, rather, determined with the simultaneous maturation of the procapsid, although this is subject to influence by the portal, as demonstrated previously. at the vertex-proximal end of catc, interactions with triplex ta occur exclusively through porf32. an n-terminal helix (nh; aa 2-14) and two strands from porf32's n-terminal b-barrel domain directly contact the apical surfaces of ta's tri2a/b dimer (figures 5a-5c; video s3). additionally, the aforementioned al binds in ta's deep hydrophobic cleft between tri1 and the tri2a/b dimer (figures 5b-5d; video s3). al binding involves three porf32 hydrophobic residues-val222, leu224, and phe226 ( figure 5d )-as well as b sheet-like hydrogen bonding between the al and an adjacent ta tri1 strand ( figure 5e ). (legend continued on next page) at the vertex-distal end of catc, porf32 sits atop, but does not directly bind to, triplex tc. instead, catc-tc interactions rely on the n-terminal tails of catc's two porf19 subunits, which extend distally from the helix bundle and descend beneath porf32 to contact tc (figures 5a and 5f ). contact with tc occurs mainly via a short helical motif (aa 20-24) from the n-terminal tail of the ''upper'' (magenta) porf19 subunit. an adjacent helix belonging to tc tri1 (aa 264-275) orthogonal to the porf19 helical motif fits within the motif's helical groove (figure 5f ), with tri1's thr269 and arg268 and porf19's arg22 contributing hydrogen bonds to this intermolecular interaction ( figure 5g ). in contrast to upper porf19, ''lower'' (green) porf19 exhibits no apparent direct contacts with tc in our visible structure. nonetheless, lower porf19 plays an important role in lashing catc's constituents as a collective unit. chiefly, lower porf19's n-terminal tail contributes two b strands (aa 23-25 and 31-33) to form two sets of b sheet interactions-the first with one b strand from porf32 (aa 319-321) and upper porf19 (aa 30-32) each ( figure 5f , starred) and the second with a b strand from porf32 (aa 311-313) ( figure 5h , starred). these two sets of b sheets fasten the vertex-distal end of catc in a quasi b-barrel. the first 17 and 21 residues of upper and lower porf19's n-terminal tails, respectively, are flexible and, thus, unmodeled ( figure 5h ). however, we speculate that some of these unmodeled residues may also bind tc, augmenting previously described catc-tc interactions at upper porf19's short helical motif. particularly, strong unassigned densities roughly three aa in length occupy the tri1-tri2a/b surface groove of tc ( figure 5i ), analogous to porf32's al occupying triplex ta's groove ( figures 5b-5d ). given that the n-terminal tail of upper porf19 appears to extend away from tc, and given the proximity of lower porf19's n th -most residue to tc's groove, the unassigned densities likely belong to the unmodeled n-terminal residues of lower porf19. if interactions here are also similar to al binding at ta (i.e., in part involving b sheet-like hydrogen bonds between adjacent backbones), then porf19 binding at this site may not necessarily be sequence-specific, thus accounting for the unresolved sidechain densities because of averaging. to validate our structural interpretation of the roles of the porf32 nh and al in catc-ta binding, we constructed three porf32 mutants with deletions in either the nh or al or both, named 32dnh, 32dal, and 32dnh-dal. we posited that these ''loss-of-ta-interaction'' porf32 mutants might serve as dominant negatives if nh and/or al are critical for catc-ta binding. essentially, porf32 mutants expressed in kshv-replicating cells should compete against wild-type (wt) porf32 (32wt) for incorporation into catc so that mutant-incorporated catc would be deficient in capsid association, inhibiting kshv virion formation. indeed, expression of 32dnh and 32dal reduced virion production to 30.7% and 27.8% of that of the control, respectively, whereas expression of 32dnh-dal inhibited levels to 12.1% (figures 5j and 5k) . importantly, viral dna replication and viral rna transcription were not significantly affected (figures s6a and s6c) . these results confirm that the porf32 nh and al are both important for kshv virion production, supporting our interpretation that these aa stretches mediate catc-ta binding. similarly, to test our hypothesis that porf19's n-terminal tail is important for catc capsid association and, hence, function, we generated two porf19 mutants: 19dn28, which deletes the first n-terminal 28 residues of porf19 (i.e., all porf19 residues in contact with triplex tc), and 19dn17, which deletes porf19's first 17 residues (i.e., retains all porf19 residues visible in our structure but deletes the disordered, unmodeled n-terminal tails). compared with wt porf19 (19wt), 19dn28 expression reduced virion production to 9.7% of that of the control without significantly affecting viral dna replication or rna transcription ( figures 5l, 5m , s6b, and s6d), indicating that porf19 n-terminal interactions with tc ( figures 5f-5i ) are important for viral efficacy. remarkably, expression of 19dn17 also inhibited virion production to 30.6% of that of the control ( figures 5l, 5m , s6b, and s6d), suggesting that porf19's first 17 residues of porf19 are required for optimal porf19-tc binding. these findings are therefore consistent with our speculation that lower porf19's unmodeled n-terminal residues (aa 1-21) bind tc's hydrophobic groove, enhancing catc-tc binding. finally, we mutated residues in the hydrophobic triplex groove, creating six mutants: tri1-i280r, tri1-l278r/i280r/ l283e, tri2-a216r, tri2-l217r, tri2-a216r/l217r, and tri2-v244r ( figure 5i ). these mutants affect binding grooves indiscriminately at both ta and tc triplexes (and, therefore, catc binding at both its vertex-proximal and vertex-distal ends). all mutants yielded decreased virion production ( figure 5n ), (d) the al (ribbon-and-stick superimposed with mesh density) binds a hydrophobic groove between the tri1 and tri2 dimer (cyan, hydrophilic / maroon, hydrophobic). (e) the al forms b sheet-like hydrogen bonds (dotted lines) with an adjacent tri1 strand (some side chains are hidden for clarity). (f and g) consecutive enlarged views showing catc-tc interaction via upper porf19, which interacts with tc tri1 via a helical motif (g). dotted lines represent hydrogen bonds. lower porf19 facilitates two sets of b sheets (starred in f and h) at the catc's vertex-distal end. (h and i) the flexible n-terminal residues of upper and lower porf19 could not be modeled, although the disordered density from the two porf19 copies can be traced (h, dashed lines) and observed (i, gray density). (j-m) structure-guided mutageneses of porf32 (j and k) and porf19 (l and m), confirming the importance of their respective triplex-binding segments in infectious virion production. overexpressing porf32 mutants with deletions of n-terminal helix (dnh) and/or anchoring loop (dal) in kshv lytic-replicating cells reduced virion production normalized to the control (vector) (j). likewise, overexpressing porf19 with truncated n-terminal 17 or 28 residues reduced virion production (l). expression of mutant porf32 (k) and porf19 (m) was verified by western blotting with an anti-flag antibody. expression of cellular gapdh was examined as an internal control. (n-p) mutageneses of triplex protein residues at the catc-binding groove. measures of viral titer (n), genome replication (o), and rna transcription (p) are shown. the dashed line in (n) indicates the detection limit. data are mean ± sem (n = 3 biologically independent samples). see also figure s6 and video s3. whereas neither viral dna replication nor rna transcription were significantly affected ( figures 5o and 5p) . impressively, virion production of mutant tri2-a216r/l217r was under the detection limit ( figure 5n, ud) . the visualization of porf32 and porf19 peptide fragments binding in the respective deep grooves of triplex ta and tc (figures 5a -5i) and our demonstration of their importance in viral replication ( figures 5j-5p ) open exciting prospects for future kshv inhibitor design and drug development. currently, most available drugs countering herpesvirus infections are nucleic acid analogs that target viral dna synthesis. unfortunately, these often elicit negative side effects, including frequent induction of drug-resistant mutant viruses and varying kinds of toxicity (gilbert et al., 2002; reusser, 1996; walling et al., 2003) . that the dna-packaging portal vertex is critically involved in the production of virus progeny and that the five portal vertex catc registers exhibit full catc occupancy-which suggests that disrupting a single portal-associated catc might abolish the assembly of infectious virion-presents the possibility of a novel antiviral target without homology with cellular proteins. therefore, a catc-specific inhibitor would be very potent, and the deep groove on triplexes can serve as a structurally informed target for focused antiviral development. the portal is widely believed to serve as the nucleating nexus of herpesvirus procapsid formation newcomb et al., 2005 )-although capsid-like particles have been observed in capsid protein-expressing insect cells and cell-free systems in the absence of portal protein (newcomb et al., 1994; perkins et al., 2008; tatman et al., 1994 )-by facilitating initial interactions with scaffold-tethered mcp subunits and portal-adjacent triplexes (deng et al., 2008; zhou et al., 1998) . a metastable spherical procapsid then forms as follows. tethering by the scaffolding protein brings together additional mcp molecules. the scaffolding core ensures proper curvature and size of the capsid in what is known as the ''rope'' mechanism (deng et al., 2008) , and heterotrimeric triplexes, with the n-terminal anchor of tri1 (porf62) traversing the capsid shell, ''plug'' every 3-fold hole between mcp capsomers. our data showing that catc-triplex ta association correlates with a 120 apical rotation of ta ( figure 4i and 4j)-but not the 3-fold-related 240 , eliminating the possibility of this being a stochastic event-indicate that the determination of catc binding (and, likely, binding of catc itself or at least a subunit of catc; thurlow et al., 2006) occurs not long after procapsid formation, when the triplex's apical orientation has yet to be fixed by procapsid maturation. procapsid maturation is concomitant with dna packaging; head-full is likely sensed through capsid elements and relayed by the portal through the turret-like structure to the externally located terminase for cleavage of the concatemeric genome (yang et al., 2007) , and the spring-loaded dsdna genome is corked inside the capsid by the portal cap, putatively formed by five portal-adjacent catcs ( figure 1b) . that capsid structures are rigidly symmetric reflects the capsid's well-defined and conserved role in packaging and protecting the viral genome, in marked contrast to both form and function of the largely pleomorphic tegument layer. accordingly, all three sub-families of herpesviruses have highly conserved capsid proteins and structures but only partially conserved tegument and envelope proteins because these more often reflect specific host cell adaptations. among tegument proteins, catc is unique in possessing a structural role and, thus, being relatively conserved, but even so, a surprising finding here is that, unlike the full occupancy of catc at capsid vertices in neurotropic hsv-1 and hsv-2 (dai et al., 2014; wang et al., 2018) , catc occupancy in kshv at penton vertices is only partial and varies even among kshv capsids (figures 2 and 3) . that kshv catc occupancy appears to be dictated at least in part by the portal not only underscores the portal's aforementioned nucleation role but further spotlights its allosteric effect in defining an emergent first level of variability that, importantly, demarcates a departure from symmetry in kshv metastructure. a second constructive level of variability is facilitated by porf64, which accounts for two of the five subunits in catc and is the largest tegument protein, with 2,635 residues folded into multiple domains joined by presumably flexible linkers (the vast majority of porf64 is thus invisible in our structure). some of these domains recruit other tegument proteins and bind the endodomains of envelope glycoproteins (rozen et al., 2008; sathish et al., 2012) , introducing pleomorphic variability. last, the majority of tegument proteins, including porf64, have been shown to be capable of being packaged into non-infectious virus-like vesicles in the absence of capsids (gong et al., 2017) , suggesting an alternate pathway of tegument incorporation into virions independent of the capsid/catc, implicating yet another level of assembly-driven variability. in light of the tegument's primary role in manipulating host cells to facilitate virus replication, the multiple layers of structural variability at play in the tegument provide unique opportunities in the context of herpesvirus adaptability. in much the same way that more genetically fluid rna viruses such as influenza viruses (harris et al., 2006; vahey and fletcher, 2019) , filoviruses (bharat et al., 2011) , and coronaviruses (goldsmith et al., 2004) benefit from heterogeneous compositions to rapidly respond to selective pressures, structural pleomorphism may provide a crucial avenue of diversity in more genetically constrained dsdna viruses (because of lower mutation rates) like kshv, resulting in an increase in evolutionary bandwidth. in essence, facing a relative lack of intraspecies genetic diversity (the tradeoff is more viable progeny), adaptability regarding the ability to manipulate hosts is, instead, implemented at a structural level. our findings regarding kshv catc occupancy offer a pertinent example: given that hsv pul36 (kshv porf64's homolog) is involved in axonic transport of alphaherpesvirus capsids luxton et al., 2005) , the absence of full catc occupancy in non-neurotropic kshv perhaps reflects its lack of need for long-range neuronal transport. the emergent transition from rigidly symmetric structures to less-structured compartments described here thus sheds light on a delicate balance of conservation and adaptation that delineates the genetic arms race between herpesvirus and host. detailed methods are provided in the online version of this paper and include the following: we thank the bioinformatics center of the university of science and technology of china, school of life science, for providing supercomputing resources for this project. this research has been supported in part by grants from the national key r&d program of china (2017yfa0505300 and 2016yfa0400900 and nih (gm071940, de025567, de028583, de027901, and ai094386 to z.h.z and ca177322, ca091791, and de023591 to r.s.). we acknowledge the use of resources at the electron imaging center for nanomachines supported by ucla and by instrumentation grants from the nih (1s10rr23057 and 1u24gm116792) and nsf (dbi-1338135 and dmr-1548924) . further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, z. hong zhou (hong.zhou@ucla.edu). viruses and cell lines kshv virions were isolated from islk-kshv-bac16 cells, received as a gift from dr. jae u. jung of the university of southern california. dr. jung's group previously established the islk-kshv-bac16 cell line, which supports robust kshv lytic replication and the production of kshv virions as previously described (brulois et al., 2012; dai et al., 2014 . islk-kshv-bac16 cells were cultured in dulbecco's modified eagle medium (dmem) supplemented with 10% fetal bovine serum, 1% penicillin streptomycin, 1 mg/ml puromycin, 250 mg/ml g418, and 1,200 mg/ml hygromycin b. to induce kshv lytic replication and thus kshv virion production, cells were treated with 1 mg/ml doxycycline plus 1 mm sodium butyrate (nab) for three days, after which tissue culture supernatant was collected. kshv virions were pelleted by ultracentrifugation at 80,000 g for 1 hour, then resuspended in phosphate buffered saline (pbs) and further purified in 15%-50% (w/v) sucrose density gradient sedimented at 100,000 g for 1 hour. in addition to cryoem studies, these kshv virions were also used to infect human 293t cells at a moi of 3. 293t cells infected with wild-type kshv virions (termed 293t-kshv) or transfected with triplex mutant kshv bacs were selected and maintained in dmem supplemented with 10% fetal bovine serum, 1% penicillin-streptomycin, and 100 mg/ml hygromycin b . these 293t cells harboring kshv wild-type or triplex protein mutants were used for all functional analyses as described below in method details. cryoem and icosahedral reconstruction aliquots of 2.5 ml purified virion sample were applied onto quantifoil r2/1 cu grids, manually blotted with filter paper and plungefrozen in liquid ethane. super-resolution movies of purified wild-type kshv intact virions were recorded on a gatan k2 direct electron detection camera in counting mode with a pixel size of 1.03-å /pixel at the specimen scale. the 26 frames in each movie were subjected to drift correction using motioncorr (li et al., 2013) and averaged to produce one micrograph. defocus for each micrograph was determined by ctffind3 (mindell and grigorieff, 2003) , and a total of 44,328 viral particles were picked manually. because the size of boxed particles (1,440 3 1,440 pixels) was so large that the cumulative dataset required an unrealistic amount of computer memory for computation, boxed particles were normalized and binned four times prior to implementing standard icosahedral reconstruction procedures in relion (scheres, 2012) . using a gaussian ball as the initial reference, auto-refinement for icosahedral reconstruction was performed imposing i3 symmetry, generating an icosahedral map with one of the twelve 5-fold axes aligned on the z axis. sub-particle extraction from icosahedron vertices and local focus value calculation as illustrated in figure s1 , we extracted twelve sub-particles corresponding to the twelve capsid vertices for each kshv virion based on the icosahedral orientation described above. to do so, we first expanded the icosahedral symmetry of the particles using relion_particle_symmetry_expand, generating 60 icosahedrally-related orientations for each particle. each orientation has three euler angles denoted as parameters within the relion star files: rot (_rlnanglerot), tilt (_rlnangletilt), and psi (_rlnanglepsi). we then defined the orientation for each of the twelve vertices from the 60 icosahedrally-related orientations as follows. first, we noted that because our icosahedral reconstruction was performed using i3 symmetry, there are 5 redundant orientations relative to each vertex that differ only in their rot angles (i.e., the in-plane rotation angle about the z axis). for this reason, we classified the 60 orientations into twelve groups each, with five orientations per group that differ only in their rot angles. we then randomly chose one of the five orientations in each group as the orientation of that vertex, thereby defining the orientations for all twelve vertices of each capsid. next, we determined the location of each vertex sub-particle on the viral particle image. the two-dimensional cartesian positions (x, y) of each sub-particle on their respective particle image were calculated using the following formula: x = cosðpsiþsinðtiltþd + c à ox y = à sinðpsiþsinðtiltþd + c à oy (1) where d is the distance from the center of the reconstructed capsid to the vertex in pixels and c is the center of the 2d projection image (in our case, the projection center is at [720, 720], so c = 720 pixels). because icosahedral reconstruction was performed with four times-binned particles, ox and oy are four times the offset distance (_rlnoriginx and _rlnoriginy in relion) of each particle image relative to the projection center of the icosahedral reconstruction. finally, sub-particles (384 3 384 pixels) containing only vertices, henceforth termed ''vertex sub-particles,'' were extracted from original unbinned particle images based on their calculated positions using relion_preprocess without further normalization. our sub-particle reconstruction method also enabled a more accurate determination of the defocus for each sub-particle, thus alleviating the well-documented depth-of-focus problem (derosier, 2000; zhang and zhou, 2011) . the defocus value of each vertex sub-particle was calculated based on its location with the following formula: where dz 0 is the original defocus and dz is the new defocus for each vertex. classification and refinement of vertex sub-particles with 5-fold symmetry to identify the unique portal vertex from among the 12 icosahedral vertices for each virus, we classified all vertex sub-particles with 5-fold symmetry ( figure s1 ). no rotational orientation search was allowed during classification (using the -skip_rotate parameter in relion), though the center for each vertex sub-particle was refined with a ± 3 pixel offset search. the initial reference for classification was a 30-å reconstruction of the vertex sub-particles using relion_reconstruct. although the portal vertex lacks true 5-fold symmetry, this classification with 5-fold symmetry imposed successfully distinguished between penton and portal vertices. through 50 iterations of 3d classification, four classes were ultimately generated, with one class in particular exhibiting markedly different structures (i.e., a blurry central channel with a rod-like density) ( figure s1 ). additionally, this class contained 7.9%, or approximately 1/12 th of the vertex sub-particles, consistent with the expectation that one out of twelve capsid vertices in each particle is a portal vertex. we thus considered this class the portal vertex class. in rare instances, two or more vertices from a capsid were classified into the portal vertex class, likely due to the low quality of these individual particles and/or errors in classification. these redundant sub-particles were removed as follows: if two or more vertices from the same virus particle were assigned to the portal vertex class, only the vertex subparticle with the highest _rlnmaxvalueprobdistribution score was retained. upon removing all redundant particles, 39,773 vertex sub-particles remained and were deemed sub-particles of the portal vertex, henceforth referred to as ''portal vertex sub-particles.'' 3d auto-refinement with 5-fold symmetry imposed was then performed on these portal vertex sub-particles with only a local search for orientation determination. using relion_postprocess, the final resolution of this c5 reconstruction was calculated with two independently refined maps from halves of the dataset with gold-standard fsc at the 0.143 criterion (rosenthal and henderson, 2003) , and determined to be 4.3-å ( figures s2a and s2d ). this reconstruction of the portal vertex contains a well-resolved 5-fold-arranged capsid and tegument, but a smeared portal dodecamer density due to symmetry mismatch. reconstructing the portal dodecamer with 12-fold symmetry from the portal vertex sub-particles, we further extracted sub-particles containing only the portal dodecamer in order to reconstruct the 12-fold symmetric portal. the positions of portal dodecamers on portal vertex sub-particles were determined using the above formula (1). the euler angles (rot, tilt, and psi), ox, and oy are the orientation parameters of the portal vertex sub-particles; d is the distance from the center of the dodecamer to the center of the portal vertex sub-particle reconstruction (-100 pixels); and c is the center of the 2d projection image of the portal vertex sub-particle (192 pixels). the sub-particles of portal dodecamers (192x192 pixels), henceforth referred to as ''dodecamer sub-particles'' were then extracted with relion_preprocess using these parameters. to reconstruct the portal dodecamer, we expanded the 5-fold symmetry of the dodecamer sub-particles using relion_particle_-symmetry_expand, generating five unique orientations for each dodecamer sub-particle. we then applied 3d classification with 12-fold symmetry imposed and without rotational orientation search, which, after 100 iterations, yielded five classes of similar structures with a rotational difference of approximately 72 between classes. ideally, each of the five expanded orientations of a dodecamer sub-particle should be assigned to exactly one of the five classes, such that each class should contain 20% of the symmetry-expanded sub-particles. after removing redundant particles as previously described-only particles with the highest _rlnmaxvalueprobdistribution score was retained-the five classes contained 39, 073, 37, 753, 36, 797, 37, 270, 38 ,031 particles, respectively. as the five reconstructed classes were of the same quality upon visual inspection, we chose the class with the most abundant particles for 3d refinement with 12-fold symmetry imposed and limited to local orientation search. as with the previous reconstruction, the resolution of this c12 portal dodecamer reconstruction was calculated with relion_postprocess using gold-standard fsc at the 0.143 criterion (rosenthal and henderson, 2003) , and determined to be $4.7-å . however, both a visual assessment of the portal's density and local resolution estimate derived from resmap (kucukelbir et al., 2014) indicate the majority of the portal itself has a resolution of $4.0-å ( figures s2a and s2e ), thereby permitting ab initio modeling. the lower resolution estimate obtained from fsc calculation likely results from the unresolved, flexible regions of the portal and/or the surrounding dna and protein densities that deviate from 12-fold symmetry, which are present in the map and therefore factored into the estimation. since each portal orientation determined from the previous round of portal dodecamer classification was selected from one of the five expanded orientations of each portal vertex sub-particle, these orientations can be used for 3d refinement of the portal vertex and whole virion without symmetry. the asymmetric auto-refinement for both portal vertex sub-particles and virion particles was thus performed with a local search for orientations determined from the classification of the portal dodecamer. due to the large computational requirement for refinement of the whole virion, we performed this refinement using two times-binned particles. the resolution of the portal vertex and whole virion c1 reconstructions was determined by relion_postprocess to be 5.2-å and 7.6-å , respectively ( figures s2a, s2b , s2f, and s2g), according to gold-standard fsc at the 0.143 criterion (rosenthal and henderson, 2003) . focused classification of symmetry-relaxed penton vertex sub-particles our classification of vertex sub-particles identified not only the portal vertex class, but also generated three classes of penton vertex ( figure s1 ). the sub-particles of these three classes were combined and deemed ''penton vertex sub-particles.'' due to the 5-fold symmetry surrounding capsid vertices, catc can bind to any of five registers at the penton vertex. to determine the structures of both catc-bound and catc-absent registers, we expanded the 5-fold symmetry of penton vertex sub-particles by relion_particle_symmetry_expand, producing 2,450,245 symmetry-expanded sub-particles of penton vertex. to create a mask for focused classification of catc, one catc-containing region with its corresponding triplex ta was manually traced using volume_ tracer in chimera , after which a mask encompassing the traced region was created by relion_mask_create in relion. focused classification of the masked region was then performed on symmetry-expanded penton vertex sub-particles with neither angular nor offset search (using the -skip_align parameter in relion). since classification was performed on such a small area relative to the whole reconstruction, we specified a tau factor of 10 during classification (scheres, 2016) . after 90 iterations, four classes were generated, among which only one class had apparent catc density corresponding to 37.9% of symmetry-expanded masked regions, consistent with the occupancy calculated from a previous study (dai et al., 2014) . we therefore considered this a catc-binding class. the other three classes, though of slightly differing map quality, clearly lacked catc density while sharing the same triplex ta orientation. we therefore regarded these three classes as catc-absent classes. gag gac cac-3 0 , and also cloned into redtrackcmv vector to generate an n-terminal flag-tagged expression plasmid (19wt). porf32 mutants and porf19 mutants were generated by pcr-based deletion mutagenesis from 32wt and 19wt, respectively. sequences of the pcr-amplified fragments and their correct insertion in the plasmid were verified by sequencing. the concentration of infectious kshv virions was determined as previously described (gong et al., 2016) . briefly, 293t-kshv cells were transfected with tegument protein expression plasmids or the empty vector as control. at 16 h post-transfection, cells were treated with 0.5 mm nab plus 25 ng/ml 12-o-tetradecanoylphorbol-13-acetate (tpa) to induce kshv lytic replication. three days later, supernatants were collected, centrifuged at 10,000 g for 10 min at 4 c to remove cellular debris, serially diluted in dmem with 10% fbs, and then used to infect 293t cells in 96-well plates by spinoculation (3,000 g for 1 h at 30 c). two days post-infection, gfp-positive cell clusters containing two or more cells were counted under a fluorescence microscope to determine the titer of kshv virion. infectious units (iu) are expressed as the number of gfp-positive cell clusters in each well at a specific dilution of the viral stock. measuring viral dna replication and rna transcription by real-time pcr total dna was isolated from 293t-kshv cells induced with nab plus tpa, and viral genome copy numbers were determined by realtime pcr using primers for the essential viral gene orf59. total rna was extracted from cells with purelink rna mini kit (thermo fisher scientific), treated with dnase i, then reverse-transcribed using superscript iii reverse transcriptase (thermo fisher scientific) and random hexamers. real-time pcr was then performed with the following primers to detect the corresponding dna or rna transcripts. host house-keeping gene gapdh: western blotting and antibodies cells were lysed in 1x western blotting loading buffer, resolved by sds-page gel electrophoresis, and transferred onto pvdf membrane. proteins were detected with antibodies against flag-epitope (sigma-aldrich) or gapdh (abcam). hrp-conjugated secondary antibodies (cell signaling technology) were used for detection with supersignal west femto maximum sensitivity substrate (thermo fisher scientific). construction of tri1 and tri2 kshv mutants kshv-bac16 plasmid was modified according to a previously described method (brulois et al., 2012; gong et al., 2016) . briefly, dna fragments of kshv orf62 (tri1) and orf26 (tri2) with defined mutations were used to replace the wild-type sequence in kshv-bac16 plasmid by homologous recombination. restriction patterns of mutated kshv bac plasmids were verified by comparison to that of wild-type kshv-bac16 plasmid to ensure overall genome integrity. fragments containing mutations were pcr-amplified from bac plasmids and sequenced to confirm that all mutations were correct. mutant bac plasmids were transfected into 293t cells individually, followed by selection with 100 mg/ml hygromycin b for one month to generate cell lines latently infected by a specific kshv mutant virus. as described above, kshv lytic replication was induced by treatment of cells with 0.5 mm nab plus 25 ng/ml tpa. three days later, supernatants were collected for determining titers of infectious kshv virions, while cells were harvested for measuring viral dna replication and rna transcription as described above. statistical comparisons between groups were made using student's t test calculated in microsoft excel. data and error bars displayed for measured relative virion production, viral titer, genome replication levels, and rna transcription levels (shown in figures 5j, 5l, 5n -5p, and s6a-s6d) are presented as mean ± sem (n = 3 biologically independent samples). six cryoem maps generated during this study have been deposited in the electron microscopy data bank (emdb) and are available under accession numbers emd-20430 (c1 virion capsid reconstruction), emd-20431 (c1 portal vertex reconstruction), emd-20432 (c5 portal vertex reconstruction), emd-20437 (c12 portal reconstruction), emd-20433 (c1 penton vertex register, catc-absent reconstruction), and emd-20436 (c1 penton vertex register, catc-binding reconstruction). atomic models corresponding to emd-20432, emd-20437, emd-20433, and emd-20436 have been deposited in the protein data bank (pdb) and are available under accession numbers pdb-6ppb, pdb-6ppi, pdb-6ppd, and pdb-6pph, respectively. sub-particle reconstruction scripts used in our workflow have been deposited on github and can be accessed here: https://github.com/procyontao/herpesportal. figure s1 . sequential localized classification and sub-particle reconstruction, related to table 1 flowchart illustrates the application of sequential localized classification and reconstruction to resolve symmetry-mismatched structures of portal and penton vertices. table 1 and star methods (a-c) gold-standard fsc curves of all cryoem reconstructions. based on the 0.143 criterion, the resolutions of our c1 portal vertex, c5 portal vertex, and c12 portal reconstructions are 5.2-å , 4.3-å , and 4.7-å , respectively. the resolution of our c1 capsid reconstruction is 7.6-å , and the resolution of our penton vertex reconstructions with and without catc are 3.8-å and 3.7-å , respectively. (d-i) density maps colored by local resolution estimated from resmap (kucukelbir et al., 2014) . note that despite an fsc estimated resolution of 4.7-å , the vast majority of our c12 map reached a resolution of 4.0-å or better, thus enabling atomic model building. figure s3 . bioinformatics predictions and analyses, related to figures 1 and 4 (a) secondary structure and disorder prediction for porf43 obtained from phyre2 (kelley et al., 2015) , annotated by porf43 atomic model domains for reference as per key. (b) structure-based pairwise alignment of hsv-1 pul17 and kshv porf32, performed using matchmaker in chimera . porf19 is only visible in the unsharpened map. boxed regions correspond to colored inset boxes illustrating residue features (ribbon-and-stick) in density (mesh). (b) c1 reconstruction of penton vertex with one catc bound, colored as per key. dotted black circle denotes density putatively assigned to the second porf19 head domain (density shown at a lower threshold than that of surrounding features). inset displays the c1 reconstruction in (a) fitted within the same reconstruction gaussian-filtered at 3.0s to showcase connectivity between the weaker (green) putative porf19 head domain and catc helix bundle. (c) catc density shown with both porf19 head domains (at lower thresholds). note that a helical density feature can be observed in the putative (green) porf19 head domain density. figure s6 . overexpressing porf32/porf19 mutants does not significantly affect kshv dna replication or gene expression, related to figure 5 (a-b) viral genome replication in cells overexpressing wild-type or mutant forms of porf32 (a) or porf19 (b). 293t-kshv cells were transfected with corresponding expression plasmids or empty vector as control, and then induced with nab and tpa to facilitate kshv lytic replication. total dna was extracted from cells, and genome replication was determined by qpcr. (c-d) viral rna transcription in cells transfected with wild-type or mutant forms of porf32 (c) or porf19 (d). total rna was extracted from the same cells as (a-b). viral rna transcripts were quantified by rt-qpcr and presented as x-fold changes over rna level of empty vector transfected cells. data is mean ± sem (n = 3 biologically independent samples). phenix: a comprehensive python-based system for macromolecular structure solution herpes simplex virus dna packaging sequences adopt novel structures that are specifically recognized by a component of the cleavage and packaging machinery dna cleavage and packaging proteins encoded by genes u(l) cryo-electron tomography of marburg virus particles and their morphogenesis within infected cells structural characterization of the ul25 dna-packaging protein from herpes simplex virus type 1 construction and manipulation of a new kaposi's sarcoma-associated herpesvirus bacterial artificial chromosome clone visualization of the herpes simplex virus portal in situ by cryo-electron tomography procapsid assembly, maturation, nuclear exit: dynamic steps in the production of infectious herpesvirions identification of herpesvirus-like dna sequences in aids-associated kaposi's sarcoma electron cryotomography reveals the portal in the herpesvirus capsid organization of capsid-associated tegument components in kaposi's sarcoma-associated herpesvirus structure and mutagenesis reveal essential capsid protein interactions for kshv replication cryo-electron tomography of kaposi's sarcoma-associated herpesvirus capsids reveals dynamic scaffolding structures essential to capsid assembly and maturation correction of high-resolution data for curvature of the ewald sphere features and development of coot kshv and the pathogenesis of kaposi sarcoma: listening to human biology and medicine kshv: pathways to tumorigenesis and persistent infection resistance of herpesviruses to antiviral drugs: clinical impacts and molecular mechanisms ultrastructural characterization of sars coronavirus a herpesvirus protein selectively inhibits cellular mrna nuclear export virus-like vesicles of kaposi's sarcoma-associated herpesvirus activate lytic replication by triggering differentiation signaling influenza virus pleiomorphy characterized by cryoelectron tomography herpesvirus capsid assembly and dna packaging the c terminus of the herpes simplex virus ul25 protein is required for release of viral genomes from capsids bound to nuclear pores the phyre2 web portal for protein modeling, prediction and analysis quantifying the local resolution of cryo-em density maps structural framework for dna translocation via the viral portal protein electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-em a pul25 dimer interfaces the pseudorabies virus capsid and tegument different capsid-binding patterns of the b-herpesvirusspecific tegument protein pp150 (pm32/pul32) in murine and human cytomegaloviruses portal protein functions akin to a dna-sensor that couples genome-packaging to icosahedral capsid maturation targeting of herpesvirus capsid transport in axons is coupled to association with specific sets of tegument proteins structure of the herpes simplex virus portal-vertex accurate determination of local defocus and specimen tilt in electron microscopy a viral scaffolding protein triggers portal ring oligomerization and incorporation during procapsid assembly cell-free assembly of the herpes simplex virus capsid involvement of the portal at an early step in herpes simplex virus capsid assembly role of the ul25 gene product in packaging dna into the herpes simplex virus capsid: location of ul25 product in the capsid and demonstration that it binds dna tegument assembly and secondary envelopment of alphaherpesviruses herpesvirus capsid association with the nuclear pore complex and viral dna release involve the nucleoporin can/nup214 and the capsid protein pul25 small capsid protein porf65 is essential for assembly of kaposi's sarcoma-associated herpesvirus capsids ucsf chimera-a visualization system for exploratory research and analysis herpesvirus resistance to antiviral drugs: a review of the mechanisms, clinical importance and therapeutic options seeing the portal in herpes simplex virus type 1 b capsids optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy virion-wide protein interactions of kaposi's sarcoma-associated herpesvirus tegument proteins of kaposi's sarcoma-associated herpesvirus and related gamma-herpesviruses relion: implementation of a bayesian approach to cryo-em structure determination processing of structurally heterogeneous cryo-em data in relion cryo-em structure of the bacteriophage t4 portal protein assembly at near-atomic resolution assembly of herpes simplex virus type 1 capsids using a panel of recombinant baculoviruses herpes simplex virus type 1 dna-packaging protein ul17 is required for efficient binding of ul25 to capsids capsid structure of kaposi's sarcoma-associated herpesvirus, a gammaherpesvirus, compared to those of an alphaherpesvirus, herpes simplex virus type 1, and a betaherpesvirus, cytomegalovirus low-fidelity assembly of influenza a virus promotes escape from host cells epstein-barr virus replication in oral hairy leukoplakia: response, persistence, and resistance to treatment with valacyclovir structure of the herpes simplex virus type 2 c-capsid with capsid-vertex-specific component protein structure modeling with modeller threedimensional structure of the human herpesvirus 8 capsid atomic structure of the human cytomegalovirus capsid with its securing tegument layer of pp150 building atomic models based on near atomic resolution cryoem maps with existing tools limiting factors in atomic resolution cryo electron microscopy: no simple tricks identification of the sites of interaction between the scaffold and outer shell in herpes simplex virus-1 capsids by difference electron imaging four levels of hierarchical organization, including noncovalent chainmail, brace the mature tumor herpesvirus capsid against pressurization to reiterate, capsid vertices can theoretically be bound by any number of catcs from zero up to five. moreover, in cases of two catcs bound to a penton vertex, the two catcs can either occupy registers adjacent to each other or registers with a gap (i.e., an empty register) in between. we named these two binding patterns ''ortho-catc-binding'' and ''meta-catc-binding,'' respectively. in instances of three catcs bound to a penton vertex, the two vacant registers can either be adjacent or have a gap (i.e., an occupied register) between them, analogous to the case of two vertex-bound catcs. we thus named these two permutations ''ortho-catc-absent'' and ''meta-catc-absent,'' respectively. altogether, eight classes of penton vertices are possible. these are: zero-, one-, four-, and five-catc-binding vertices; ortho-and meta-catc-bind classified penton vertex sub-particles were then used for subsequent auto-refinement. given that five expanded orientations for each vertex were already each assigned to either a catc-binding or catc-absent class (due to masked classification of the catcbinding region), orientations for vertex sub-particles were determined as follows. the orientations for zero-and five-catc-binding vertex sub-particles were randomly selected from their five orientations in the catc-absent and catc-binding classes, respectively. orientations of one-catc-binding and four-catc-binding vertices were chosen as their single orientation in the catc-binding and catc-absent classes, respectively. where each vertex bound two catcs, orientations were chosen as one of the two orientations in the catc-binding class, such that the second (i.e., non-chosen) orientation was a 72 and 144 counterclockwise rotation from the chosen orientation, for ortho-and meta-catc-binding vertices, respectively. similarly, the orientations for ortho-and meta-catcabsent classes were chosen such that the non-chosen catc-absent orientation was a 72 and 144 counterclockwise rotation from the assigned orientation, respectively. for all auto-refinements, sub-particle orientations were determined by local angular and offset searches about these selected orientations, and initial references were 30-å reconstructions of the sub-particles generated by re-lion_reconstruct. 5-fold symmetry was imposed during refinements of zero-and five-catc-binding vertices, while refinements of all other binding occupancies were carried out without symmetry. the resolutions of the resulting eight reconstructions determined by relion_postprocess according to gold-standard fsc at the 0.143 criterion were 5.8-å , 4.7-å , 6.9-å , and 9.0-å for zero-, one-, four-, and five-catc-binding vertices, respectively; 4.1-å and 5.3-å for ortho-and meta-catc-binding vertices, respectively; and 4.7-å and 5.0-å for ortho-and meta-catc-absent vertices, respectively. as the orientation information of all catc-binding and catc-absent registers and vertex sub-particles were conserved in their respective header files in our workflow, we were able to trace each register and vertex back to their original particle image. this enabled us to conduct a survey of the global number of catcs in each capsid and generate a histogram tallying this data (figure 3b) all authors reviewed and approved the paper. the authors declare no competing interests. refinement of the catc-binding and catc-absent structures without symmetry we next performed separate 3d auto-refinements without symmetry on both penton vertex sub-particles in the catc-binding class and in the catc-absent classes. orientations for sub-particles were determined by local search around each sub-particle's predetermined orientation from the mask refinement. a 30-å reconstruction of penton vertex obtained through relion_reconstruct was used as the initial reference for refinements. the final resolutions of the catc-binding and catc-absent reconstructions determined by relion_postprocess were 3.8-å and 3.7-å , respectively, according to gold-standard fsc at the 0.143 criterion (rosenthal and henderson, 2003; figure s2c ). of note, because focused classification was performed with respect to the masked area, only structural features within the masked area (which encompasses one catc and an associated triplex ta) of the refined structure are genuine, while the other registers of catc and triplex ta are of mixed occupancy and/or conformations (figure s1) (these will be fully separated in the following procedure). local resolution estimation by resmap (kucukelbir et al., 2014) indicates the masked area of both catc-binding and catc-absent reconstructions reached a resolution of $3.5-å ( figures s2h and s2i ). key: cord-334123-wb45ww7f authors: schimmel, paul title: rna pseudoknots that interact with components of the translation apparatus date: 1989-07-14 journal: cell doi: 10.1016/0092-8674(89)90395-4 sha: doc_id: 334123 cord_uid: wb45ww7f nan it is not a long extrapolation to recognize that the unanticipated structural motifs that were found in trna are underpinnings of the rna pseudoknot. the pseudoknot was proposed as a potentially widespread motif by pleij et al. (1985) and several pieces of experimental data are consistent with its presence in some messenger, ribosomal, and viral rnas (see also rietveld et al., 1983) . the present programs for energy minimization of predicted secondary structures exclude the pseudoknot motif (zucker, 1989) but once it was recognized as at least a formal possibility, examples began to appear. most recently, two papers describe experiments that not only support the presence of pseudoknot structures in different mrnas, but also give evidence that these structures are specifically recognized by components of the translation apparatus (tang and draper, 1989; brierley et al., 1989) . a third and earlier study proposed that an rna pseudoknot is recognized by a dna binding protein that autogenously regulates translation of its mrna (mcpheeters et al., 1988) . the elucidation of the three-dimensional structure of trna was a benchmark because, among other contributions, it demonstrated the complexity of molecular form and shape possible with rna molecules. as anticipated, it confirmed the pattern of hydrogen bonds in the cloverleaf secondary structure that was predicted from the sequence. unexpectedly, it also revealed that the singlestranded loops in the proposed cloverleaf secondary structure are not passive elements. instead, they bear nucleotides that are sites for the sophisticated interactions that stabilize the highly differentiated tertiary structure. these interactions include hydrogen bonds that connect the dihydrouridine loop to the t$ and variable loops. the structure also revealed that the four helical stems of the cloverleaf are stacked in pairs to form two longer, continuous helices which are arranged at approximately right angles to form an l-shaped molecule. pseudoknots are formed from stem-loop structures in which bases outside of a stem-loop are paired with those in the loop so as to create a second stem ( figure 1a ). the second stem can be stacked upon the first to form a continuous coaxial helix. thus, the hydrogen-bonded articulation of bases in a loop with bases in another part of the rna, and the coaxial stacking of short helical stems to form a single longer helix, are general themes in trnas that are reiterated in pseudoknots. however, the pseudoknot has an added complexity that is not observed in trnas: the coaxial stacking of helices requires that singlestranded connecting nucleotides cross the grooves of the rna double helix ( figure 1b) . from a three-dimensional model it is clear that one crossing, over the deep groove, can be accomplished with two nucleotide units. the other crossing, over the shallow groove, requires (for a bridge of two nucleotides) a perturbation of the normal helical and, additionally or alternatively, bridging mononucleotide conformational parameters. tinoco's laboratory has done a careful analytical study of the structure and thermodynamic properties in solution of a nonadecanucleotide that has the potential to form a pseudoknot in which stems of 3 and 4 bp combine to make a 7 bp helix (puglisi et al., 1988) . in the proposed structure, three nucleotides bridge across the major groove and two are used to cross the minor groove. strong evidence for the pseudoknot structure was obtained. the thermodynamic stabilities and hypochromicities of the pseudoknot, of a sequence variant that could not form one of the stems of the pseudoknot, and of the individual stems were investigated. these data showed that the stacking enthalpy was higher for the pseudoknot than for the molecules that contained the individual stems, although not as high as expected for a structure with seven contiguous base pairs. this observation and the reduced hypochromicity of the pseudoknot suggest a distortion in stem and loop portions which could be caused by the presumed two-base crossing of the minor groove. as presented by pleij et al. (1985) the pseudoknot proposal developed from structural mapping and modeling of the 3' end of plant viral rnas that have trna-like properties. thus, tobacco mosaic virus rna, turnip yellow mosaic virus (tymv), and brome mosaic virus rnas are aminoacylated specifically with histidine, valine, and tyrosine, respectively, and each is recognized by the trna nucleotidyl transferase, which adds the cca sequence at the 3'terminus of all trna molecules (reviewed in haenni et al., 1982) . figure 2a shows how the four stems of the trna cloverleaf are combined into two continuous helices that constitute the two arms of the l-shaped three-dimensional structure. the acceptor-t6 stems form one continuous minihelix of 12 bp. seven of these pairs are derived from the acceptor and five from the ti$ helix. the problem is how to form an equivalent minihelix from sequences, for example, at the 3'terminus of tymv rna. a conventional secondary structure analysis suggested that this segment encodes two stem-loop segments with 5 and 4 bp, respectively. however, by implementation of the pseudoknot format, the desired 12 bp minihelix can be constructed ( figure 28 ) and combined with adjoining sequences to form a trna-like structure (not shown). whether the minihelix format is sufficient for aminoacylation of tymv or other viral rnas is not known. it is worth noting that, at least for alanine, the acceptor-t$ minihelix and the 7 bp acceptor stem can be charged by the cognate aminoacyl trna synthetase. this is because the major determinant for the identity of an alanine trna is a single base pair that is located in the acceptor helix, so that the rest of the trna structure is dispensable for aminoacylation (hou and schimmel, 1988; francklyn and schimmel, 1989) . the results with alanine trna raise the possibility that, for at least some viral rnas, synthetase recognition requires only a pseudoknot structure that bears resemblance to the 3' helical half (or less) and not to the entire trna. the trna-like structures at the 3' ends of plant viral rnas provide one example where the pseudoknot format may be necessary to form a substrate that is specifically recognized by an enzyme. dreher and hall (1988) have shown that a three base substitution that disrupts part of the proposed acceptor stem of brome mosaic virus rna simultaneously impairs aminoacylation and nucleotidyl transferase activities. to prove the dependence of synthetase recognition on the pseudoknot structure would require evaluation of an ensemble of mutant rnas (cf. hou and schimmel, 1988) . these mutants should be designed to determine whether only those which have compensa(a) illustration of the way in which the four stems of the cloverleaf secondary structure of a trna are combined into two minihelices. in the case of e. coli alanine trna, the 12 bp acceptor-t@ minihelix can be efficiently aminoacylated, provided the minihelix encodes a critical g3:u70 base pair (francklyn and schimmel, 1999) . (b) generation of a 12 bp minihelix by formation of a pseudoknot. this structure at the 3' end of tymv rna can be combined with adjacent sequences to form a complete trna-like molecule (see pleij et al., 1995) . tory base changes that preserve the pseudoknot can be aminoacylated (cf. dreher and hall, 1988) . this is analogous to the phylogenetic approach, which has been successfully used to test predictions of secondary structure in large rna molecules (noller, 1984; james et al., 1988) . in the case of aminoacylation of viral rnas, however, it is not just a question of formation of the pseudoknot structure, but of pseudoknot-dependent presentation of nucleotide determinants for protein recognition. consequently, both mutations that disrupt and others that preserve the pseudoknot may inactivate aminoacylation and, therefore, must be classified and studied separately. obviously, this analysis would be more effective if the sites for synthetase recognition in the associated trna were already known, but to date, examples for which the sites for trna identity are well defined have no viral rna counterpart. the first evidence for a pseudoknot motif in protein binding to an mrna came from experiments by mcpheeters et al. (1988) , who dissected the translational regulatory site on bacteriophage t4 gene 32 mrna. gene 32 protein binds to single-stranded dna and has a functional role in t4 dna transactions; excess amounts of the protein are free to bind to the operator region of its mrna and thereby block initiation of translation (gold, 1988) . by a combination of methods which were applied to free and complexed rna, nucleation of binding was suggested to be promoted by a pseudoknot that is approximately 40 nucleotides upstream of the initiation codon. a comparison with the mrna sequences of the related t2 and t8 phages provided phylogenetic evidence for conservation of the pseudoknot structure (mcpheeters et al., 1988) . to understand the exact structure required for presentation of the gene 32 protein binding site, direct and quantitative measurements of protein binding to an rna with the presumed pseudoknot, and to mutants that alter its structure, will have to be carried out. two, recent studies have used mutational analysis to demonstrate the role of pseudoknots in entirely different systems. one is an investigation of protein recognition of a proposed pseudoknot motif in the 5' region of the e. coli a operon mrna (tang and draper, 1989 ). this polycistronic mrna encodes four ribosomal proteins and the a subunit of rna polymerase. ribosomal protein s4 is one of the encoded proteins. the translation of this mrna is regulated by s4. this and other ribosomal proteins that are known to be translational repressors bind to a specific mrna structure and also have a binding site on 16s or 23s ribosomal rna (lindahl and zengel, 1986; thomas et al., 1987) . previous structural mapping experiments by draper and co-workers suggested a model for the 5' region of the mrna which constitutes the s4 binding site; it envisions a hairpin helix and loop that encompasses nucleotides 19 to 72. the loop sequence gggc at position 49-52 is proposed to pair with the downstream sequence gccc at position 98-101 so as to create a pseudoknot. the structural model and its relevance to s4 binding was tested by construction of mutants that alternately disrupt and restore (by compensatory changes) the presumed pseudoknot. the mutant rnas were synthesized by enzymatic methods and assayed directly for s4 binding in vitro (tang and draper, 1989) . nucleotide substitutions that disrupt the proposed pseudoknot also weaken s4 binding. several compensatory mutations which restore base pairing also restore binding affinity. for example, the g49g5,-,+cc and clooclol~g mutants are each recognized by s4 with an 6-to lo-fold lower affinity. however, the double mutant which combines both changes restores the putative pairing and the binding affinity. these and similar data with over 30 mutant rnas led to a revised and more complex structure which is visualized as a double pseudoknot. to my knowledge this is the first example of the use of direct protein-rna binding measurements to define an rna structure. the changes in affinity that accompany disruption of the proposed structure are in all cases relatively small in terms of binding energy. for example, many of the changes in the s4 association constant are less than lo-fold-a change which itself corresponds to only 1.4 kcal per mole. a change of this magnitude could be due to perturbation of a van der waals interaction. there are no mutations in the proposed pseudoknot that totally eliminate binding and would, therefore, be analogous to point mutations in a trna that eliminate aminoacylation in vitro (hou and schimmel, 1988; schulman and pelka, 1988) . possibly the most critical nucleotide determinants for s4 recognition within the pseudoknot have not been identified in the mutational analysis, as they have in the synthetase-trna system. alternatively, the interactions of a operon mrna with s4 may be distributed over many sites in the structure, so that any given alteration produces only a small change in affinity. it is noteworthy that one of the mutations that disrupts a base pair in the proposed pseudoknot structure cannot be rescued by a second site mutation which restores pairing. this could be a site where s4 interacts directly with a specific base pair, and contributes a small incremental stability to the complex. ongoing experiments will assess the relationship between alterations that disrupt the pseudoknot-dependent protein-rna interaction in vitro and the extent of translational repression in vivo. regardless of the detailed interpretation of the experiments, the analysis of rna structure by these methods is instructive and has provided one of the first examples of a functional probe for pseudoknot formation. the second study provides evidence that pseudoknot formation in a viral mrna is required for frameshift suppression of a termination codon that, in turn, allows a fusion protein to be synthesized from two overlapping reading frames. earlier work had demonstrated a role for frameshifting in the production of some retroviral gag-pol or gag-pro-pol fusion proteins (jacks et al., 1988; wilson et al., 1988, and references therein) . most commonly, termination occurs at the gag stop codon to yield virus core protein. occasional "-1" frameshifting and subsequent cleavage of the resulting fusion protein is the mechanism for production of the viral reverse transcriptase (a pol gene product). frameshift mechanisms may also be operative in the production of reverse transcriptases encoded by retrotransposons in yeast (e.g., clare et al., 1988) and drosophila (e.g., marlor et al., 1986) . varmus and co-workers established that, in rous sarcoma virus mrna, only 147 nucleotides that encode the site for frameshifting are necessary (jacks et al., 1988) . the sequence at the frameshift site is a aau uua, where the "0" reading frame is indicated. operationally there is simultaneous -1 slippage of a uua-reading trnaleu (bound to the ribosomal a-site) and an aa&reading trnaasn (bound to the p-site), which results in a double frameshift. in the -1 position these trnas are proposed to read their respective codons by two instead of three bases. slippage of these trnas is dependent on the formation of a hairpin stem-loop structure that is immediately downstream. thus, in addition to other possible mechanisms (see wilson et al., 1988) this work established that -1 frameshifting can be promoted by mrna secondary structure. the work of brierley et al. (1969) is based on studies of a nonretroviral system, avian coronavirus infectious bronchitis virus (ibv) . two long open reading frames overlap by 42 nucleotides, with the second frame shifted by -1 relative to the first. a -1 frameshift results in the production of a fusion protein. an 86 nucleotide element that spans the overlap region is sufficient to promote frameshifting, even in a heterologous context. this element encodes a stem-loop structure that starts six nucleotides downstream from the proposed site of frameshifting, near the end of the first reading frame. located 30 nucleotides further downstream from the 3' end of the putative stem is a sequence of seven bases that are complementary to the hairpin loop. thus, the overlap region encodes a sequence that could fold into a pseudoknot. support for the pseudoknot structure was sought by analysis of mutants that alternately disrupt and restore the proposed structure. mutations that disrupt either stem of the pseudoknot severely reduce frameshifting, while compensatory mutations that restore the pseudoknot also restore frameshifting. the pseudoknot inferred by this analysis is a quasicontinuous helix of 16 bp, and the necessary bridging by unpaired bases across the deep and shallow grooves is sterically feasible. of the retrovirus and related systems suspected to use -1 frameshifting, brierley et al. (1989) found that over half have the potential for pseudoknot formation immediately downstream of the lo-cation of the frameshift. this includes three of the four systems where frameshifting has actually been confirmed. it is perhaps significant that jacks et al. (1988) had shown that frameshifting is attenuated upon deletion of a region downstream of the critical stem in rous sarcoma virus mrna. the mechanism for the pseudoknot-induced frameshift is unknown but there are several points worth noting. mechanisms for frameshifting with natural as opposed to mutant trnas have been studied in bacteria and eukaryotes (reviewed in roth, 1981; craigen and caskey, 1987) . in bacteria, frameshift suppression can occur as a result of translational pausing, which is caused, for example, by starvation for an amino acid. in this case, frameshifting results from the use of a surrogate charged trna in place of the correct charged species (which is in limiting amounts). thus, a transiently unoccupied a-site on the ribosome may be "read" by a noncognate trna and that event can be accompanied by a frameshift and a missense substitution. the frameshifts that lead to fusion proteins in the retroviral and ibv examples are different in that the double frameshift event is not accompanied by missense substitutions. this suggests that a pseudoknot does not simply produce an unoccupied a-site. it is not known whether a single hairpin stem equal in length to the elongated pseudoknot structure would be as effective in inducing a frameshift in the system studied by brierley et al. (1989) . the authors show that the upstream stem alone induces frameshifting, but at a much lower efficiency than when the downstream sequences are allowed to form the second stem that results in a pseudoknot minihelix. thus, the frequency of frameshifting may in principal be fine-tuned by the size and detailed structure of the pseudoknot. i can suggest one possible advantage to the pseudoknot format over a standard hairpin helix of equivalent size. in the work of puglisi et al. (1988) the thermal stability of the pseudoknot helix was not greater than that of the moststable of the two stems from which it was assembled. if this is a general principle, then pseudoknots provide a way to generate minihelices that have lower stabilities than their counterparts, which are assembled as unknotted continuous hairpins with the same number of base pairs. the reduction in stability may be critical for efficient movement of ribosomes through an element of secondary structure in an mrna. even the spatial location of the pseudoknot relative to the frameshift site in ibv rna is sharply constrained-to less than three nucleotides (brierley et al., 1989) . this emphasizes the importance of structural detail and context for the biological function of the pseudoknot in translation. in this and other respects, it has the properties of a substrate that is specifically recognized and acted upon by an enzyme sensitive to details of molecular shape and the spacing of functional groups. there is no evidence that a component of the translation apparatus (e.g., ribosomes) performs such a recognition function before triggering the double frameshift, but operationally the result is the same and the possibility has to be at least formally considered. before the structure was solved, many predictions were made of the folding of trna. none of them correctly predicted the tertiary structure that was elucidated by x-ray diffraction analysis. this structure is now the basis for designing experiments to understand the interactions of trnas with proteins ). in the systems described above, the proposed pseudoknot secondary structures are highly schematic. nmr analyses of "simple" pseudoknots are providing some of the structural parameters that will guide model building and design (cf. wyatt et al., 1989) . further thermodynamic studies may allow more accurate energy estimates that can be the basis for including pseudoknots in programs that compute energyminimized secondary structures (cf. zucker, 1989) . however, the data of tang and draper (1989) suggest a more complex arrangement for the a operon mrna pseudoknot than the basic motif proposed by pleij et al. (1985) . in general, it is likely that tertiary structural features have an important role in the protein recognition and translational frameshifting that is now associated with rna pseudoknots. although in principle the analytical tools are available, it is these tertiary features that will be most difficult to work out. molecular biology of rna key: cord-350855-gofzhff7 authors: hou, yixuan j.; okuda, kenichi; edwards, caitlin e.; martinez, david r.; asakura, takanori; dinnon, kenneth h.; kato, takafumi; lee, rhianna e.; yount, boyd l.; mascenik, teresa m.; chen, gang; olivier, kenneth n.; ghio, andrew; tse, longping v.; leist, sarah r.; gralinski, lisa e.; schäfer, alexandra; dang, hong; gilmore, rodney; nakano, satoko; sun, ling; fulcher, m. leslie; livraghi-butrico, alessandra; nicely, nathan i.; cameron, mark; cameron, cheryl; kelvin, david j.; de silva, aravinda; margolis, david m.; markmann, alena; bartelt, luther; zumwalt, ross; martinez, fernando j.; salvatore, steven p.; borczuk, alain; tata, purushothama r.; sontake, vishwaraj; kimple, adam; jaspers, ilona; o’neal, wanda k.; randell, scott h.; boucher, richard c.; baric, ralph s. title: sars-cov-2 reverse genetics reveals a variable infection gradient in the respiratory tract date: 2020-05-27 journal: cell doi: 10.1016/j.cell.2020.05.042 sha: doc_id: 350855 cord_uid: gofzhff7 summary the mode of acquisition and causes for the variable clinical spectrum of covid-19 remain unknown. we utilized a reverse genetics system to generate a gfp reporter virus to explore sars-cov-2 pathogenesis and a luciferase reporter virus to demonstrate sera collected from sars and covid-19 patients exhibited limited cross-cov neutralization. high-sensitivity rna in situ mapping revealed the highest ace2 expression in the nose with decreasing expression throughout the lower respiratory tract, paralleled by a striking gradient of sars-cov-2 infection in proximal (high) vs distal (low) pulmonary epithelial cultures. covid-19 autopsied lung studies identified focal disease and, congruent with culture data, sars-cov-2-infected ciliated and type 2 pneumocyte cells in airway and alveolar regions, respectively. these findings highlight the nasal susceptibility to sars-cov-2 with likely subsequent aspiration-mediated virus seeding to the lung in sars-cov-2 pathogenesis. these reagents provide a foundation for investigations into virus-host interactions in protective immunity, host susceptibility, and virus pathogenesis. we measured the relative infectivity of the sars-cov-2 gfp virus in primary 283 cells based on the average peak titers and observed that infectivity exhibited the same 284 pattern as the ace2 expression levels from the upper to lower respiratory tract ( figure 285 6bi-6biv). the icsars-cov-2-gfp virus replicated efficiently in hne and lae, with 286 peak viral titers significantly higher than the titers in sae, at2-like and at1-like cultures 287 ( figure 6bv ). although the viral peak titers were similar, the icsars-cov-2-gfp 288 infection in hne culture resulted in significantly higher titers than lae at 24h, 48h and 289 96h post-infection, suggesting more robust replication in the primary nasal cells ( figure 290 6bvi). collectively, these data indicate that virus infectivity/replication efficiency varies 291 markedly from proximal airway to alveolar respiratory regions. 292 whole mount immunohistochemistry of hne and lae cultures was utilized to 293 identify cell types infected by sars-cov-2 ( figure 6c , s4a). the ciliated cell was 294 routinely infected and extruded. in contrast, the other major cell type facing the airway 295 lumen, i.e., the muc5b+ club cell, was not infected, nor was the muc5ac+ metaplastic 296 goblet cell. we did note a cell type co-expressing the ciliated cell marker tubulin and 297 muc5b was rarely infected in hne, a finding consistent with infection of a 298 secretory/club cell transitioning to a ciliated cell phenotype. 299 there is debate whether at2 and/or at1 cells express sufficient ace2 to 300 mediate infection and whether at2, at1, or both cell types are infectable. previous 301 studies reported 2003 sars-cov infects at2 but not at1 pneumocytes (mossel et al., 302 14 standard at2/at1 cell cultures and a novel cell culture approach that well preserves 304 at2 and at1 cell populations over the infection/gfp expression interval were tested. 305 as shown in figure 6a and s4b, at2 cells appeared to be preferentially infected. second, to further characterize the infectivity of lae vs sae, replication rates of 317 three sars-cov-2 viruses in lae and sae cultures from the same donor were 318 compared. all three viruses replicated more slowly in sae than lae cells. the gfp 319 virus replicated modestly less effectively than the clinical isolate or wt virus in the two 320 regions ( figure 6e ). this observation differs from the equivalent replication noted in the 321 vero-e6 cells (figures 2a and 2b) , suggesting an intact orf7 gene contributes to 322 sars-cov-2 replication, and perhaps virulence, in human tissues. 323 third, the replication of sars-cov and sars-cov-2 in lae cells were 324 compared. sars-urbani wt and gfp viruses, in parallel with the three sars-cov-2 325 viruses, were administered to lae cultures from the same donor. gfp signals were 326 detected in lae cultures for both viruses, but the sars-cov-2-gfp exhibited delayed 327 and less intense signals than sars-cov-urbani-gfp ( figure s4d ). this phenotype is 328 consistent with the growth curve in which a lower titer of sars-cov-2 was recorded at 329 24h. 330 331 we utilized rna-ish/ihc to localize virus in four lungs from sars-cov-2-333 infected deceased subjects (table s1) were also infected. rna in situ and ihc co-localization of an at2 cell marker, spc 342 (sftpc) and at1 cell marker (ager) with sars-cov-2 indicated that at2 cells and 343 at1 cells (or at2 cells that had transitioned to at1 cells) were infected ( figure 7c we generated a sars-cov-2 reverse genetics system, characterized virus rna 362 transcription profiles, evaluated the impact of ectopically expressed proteases on virus 363 growth, and used reporter viruses to characterize virus tropisms, ex vivo replication, and 364 to develop a high-throughput neutralizing assay. these reagents were utilized to 365 explore aspects of early infectivity and disease pathogenesis relevant to sars-cov-2 366 respiratory infections. 367 our rnascope/cytospin technology extended the description of ace2 in 368 respiratory epithelia based on scrnaseq data (sungnak et al., 2020) . rna/cytospin 369 detected ~20% of upper respiratory cells expressing ace2 vs ~4% for scrnaseq 370 ( figure 4f ). most of the rna-ish-detected ace2-expressing cells were ciliated cells, 371 not normal muc5b+ secretory (club) cells or goblet cells. notably, the nose contained 372 the highest percentage of ace2-expressing ciliated cells in the proximal airways ( figure 373 4g). the higher nasal ace2 expression-level findings were confirmed by qpcr data 374 comparing nasal to bronchial airway epithelia. qpcr data also revealed that ace2 375 levels further waned in the more distal bronchiolar and alveolar regions. importantly, 376 these ace2 expression patterns were paralleled by high sars-cov-2 infectivity of 377 nasal epithelium with a gradient in infectivity characterized by a marked reduction in the 378 distal lung (bronchioles, alveoli) ( figures 6a and 6b) . 379 multiple aspects of the variability in sars-cov-2 infection of respiratory epithelia 380 were notable in these studies. first, significant donor variations in virus infectivity and 381 replication efficiency were observed. notably, the variability was less in the nose than 382 lower airways. the reason(s) for the differences in lower airway susceptibility are 383 important but remain unclear (cockrell et al., 2018) . we identified variations in ace2 384 receptor expression (figures 4a-d) but not numbers of ciliated cells as potential 385 variables ( figure 6d ). second, variation in infectivity of a single cell type, i.e., the 386 ciliated cell, was noted with only a fraction of ciliated cells having access to virus 387 infected at 72 h ( figure 6a ). third, the dominant secretory cell, i.e., the muc5b+ club 388 cell, was not infected in vitro or in vivo, despite detectable ace2 and tmprss2 389 expression ( figures 4g-4i ). collectively, these data suggest that measurements of 390 ace2/tmprss2 expression do not fully describe cell infectivity and that a description 391 of other variables that mediate susceptibility to infection, including the innate immune 392 system(s), is needed (menachery et al., 2014) . 393 the ace2 receptor gradient in the normal lung raised questions focused on the 394 initial sites of respiratory tract virus infection, the mechanisms that seed infection into 395 the deep lung, and the virus-host interaction networks that attenuate or augment intra-396 regional virus growth in the lung to produce severe disease, especially in vulnerable 397 patients experiencing chronic lung or inflammatory diseases (guan et al., 2020; leung 398 et al., 2020) . 399 we speculate that nasal surfaces may be the dominant initial site for sars-covin summary, our studies have quantitated differences in ace2 receptor 525 expression and sars-cov-2 infectivity in the nose (high) vs the peripheral lung (low). 526 these studies should provide valuable reference data for future animal models 527 development and expand the pool of tissues, e.g., nasal, for future study of disease 528 pathogenesis and therapy. while speculative, if the nasal cavity is the initial site 529 mediating seeding of the lung via aspiration, these studies argue for the widespread use 530 of masks to prevent aerosol, large droplet, and/or mechanical exposure to the nasal 531 passages. complementary therapeutic strategies that reduce viral titer in the nose early 532 24 in the disease, e.g., nasal lavages, topical antivirals, or immune modulation, may be 533 beneficial. finally, our studies provide key reagents and strategies to identify type 534 specific and highly conserved neutralizing antibodies that can be assessed most easily 535 in the nasal cavity as well as in the blood and lower airway secretions. 536 537 acknowledgments 538 we would like to acknowledge the following funding sources from the national allergy further information and requests for resources and reagents should be directed to and 689 will be fulfilled by the lead contact, ralph s. baric (rbaric@email.unc.edu). 690 material and reagents generated in this study will be made available upon installment of 693 a material transfer agreement (mta). development of a 1226 broadly accessible venezuelan equine encephalitis virus replicon particle vaccine 1227 platform gene 1230 expression and in situ protein profiling of candidate sars-cov-2 receptors in human 1231 airway epithelial cells and lung tissue engineering the largest rna virus genome as an infectious 1235 bacterial artificial chromosome a clinical consideration of abscesses and cavities of the lung the 1239 proximal origin of sars-cov-2 covid-19 for the cardiologist: a current review of the virology, clinical epidemiology, 1242 cardiac and other clinical manifestations and potential therapeutic strategies fitting linear mixed-effects 1245 models using lme4 characterization of a pathogenic full-length cdna clone and transmission model for 1248 porcine epidemic diarrhea virus strain pc22a detection of airborne severe acute respiratory 1251 syndrome (sars) coronavirus and environmental contamination in sars outbreak 1252 units muco-obstructive lung diseases human alveolar type ii cells secrete and 1256 absorb liquid in response to local nucleotide signaling non-neural 1258 expression of sars-cov-2 entry genes in the olfactory epithelium suggests 1259 mechanisms underlying anosmia in covid-19 patients pulmonary post-mortem findings 1263 in a large series of covid-19 cases from northern italy reverse 1266 genetics system for the avian coronavirus infectious bronchitis virus preliminary estimates of the prevalence of 1269 selected underlying health conditions among patients with coronavirus disease united states 1273 sars: prognosis, outcome and sequelae il-1beta dominates the promucin 1276 secretory cytokine profile in cystic fibrosis a spike-modified 1279 middle east respiratory syndrome coronavirus (mers-cov) infectious clone elicits mild 1280 respiratory disease in infected rhesus macaques impact of covid-19 on people with 1283 cystic fibrosis the spike glycoprotein of the new coronavirus 2019-ncov contains a furin-like 1286 cleavage site absent in cov of the same clade a single-cell atlas of the 1289 human healthy airways. biorxiv the 1291 microbiome and the respiratory tract single-cell analysis of olfactory 1294 neurogenesis and differentiation in adult humans mucus 1299 accumulation in the lungs precedes structural changes and infection in children with 1300 cystic fibrosis idiopathic pulmonary fibrosis: a genetic disease that 1303 involves mucociliary dysfunction of the peripheral airways comparative study of 1306 simulated nebulized and spray particle deposition in chronic rhinosinusitis patients novel human bronchial epithelial 1310 cell lines for cystic fibrosis research human nasal and tracheo-bronchial respiratory 1313 epithelial cell culture 1315 (2020). the oral-lung axis: the impact of oral health on lung health pharmacological rescue of 1319 conditionally reprogrammed cystic fibrosis bronchial epithelial cells chronic e-cigarette exposure 1323 alters the human bronchial epithelial proteome quantitative aspiration during sleep 1326 in normal subjects stabilization of a full-length infectious cdna clone of transmissible gastroenteritis 1329 coronavirus by insertion of an intron the 1332 species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov 1333 and naming it sars-cov-2 comorbidity and its impact on 1590 patients 1336 with covid-19 in china: a nationwide analysis sars-cov-2 cell 1340 entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease 1341 inhibitor clinical features of patients infected with 2019 novel coronavirus in wuhan pharyngeal aspiration in 1346 normal adults and patients with depressed consciousness angiotensin-converting enzyme 2 protects from severe 1349 acute lung failure the proteolytic regulation of virus cell entry by furin and other 1351 proprotein convertases pulmonary angiotensin-converting enzyme 2 (ace2) and inflammatory 1353 lung disease influenza a virus infection causes chronic lung disease linked to sites of active 1356 viral rna remnants nrf2 expression modifies 1358 influenza a entry and replication in nasal epithelial cells. free radic mutations in 1362 rsph1 cause primary ciliary dyskinesia with a unique clinical and ciliary phenotype thromboembolic risk and anticoagulant therapy in covid-19 patients: 1366 emerging evidence and call for action a crucial role of angiotensin converting enzyme 2 (ace2) in 1370 sars coronavirus-induced lung injury lmertest package: 1372 tests in linear mixed effects models ace-2 expression in the small airway epithelia of 1375 smokers and copd patients: implications for covid-19 aerodynamic analysis of sars-cov-2 in two wuhan 1379 hospitals complement associated microvascular injury and 1382 thrombosis in the pathogenesis of severe covid-19 infection: a report of five cases maternal broadly neutralizing 1386 antibodies can select for neutralization-resistant, infant-transmitted/founder hiv variants efficient activation of the severe acute respiratory syndrome coronavirus spike 1390 protein by the transmembrane protease tmprss2 protease-1392 mediated enhancement of severe acute respiratory syndrome coronavirus infection trypsin 1396 treatment unlocks barrier for zoonotic bat coronavirus infection pathogenic influenza viruses and 1399 coronaviruses utilize similar and contrasting approaches to control interferon-stimulated 1400 gene responses a sars-like 1403 cluster of circulating bat coronaviruses shows potential for human emergence host cell entry of middle east respiratory 1406 syndrome coronavirus after two-step, furin-mediated activation of the spike protein airborne transmission of sars-cov-2: the world 1409 should face the reality sars-cov replicates in primary 1412 human alveolar type ii cell cultures but not in type i-like cells herpes simplex 1414 virus pneumonia: importance of aspiration etiology localization of secretory mucins 1417 muc5ac and muc5b in normal/healthy human airways dynamic 1421 expression of hopx in alveolar epithelial cells reflects injury and repair during the 1422 progression of pulmonary fibrosis asymptomatic cases in a family cluster with sars-cov-2 infection viral load of cov-2 in clinical samples nasal mucociliary clearance in health and 1429 disease the size distribution of droplets in the 1431 exhaled breath of healthy human subjects gastric 1433 emptying and glycaemia in health and diabetes mellitus a 1436 mathematical model describing the localization and spread of influenza a virus infection 1437 within the human respiratory tract influenza a viruses are 1440 transmitted via the air from the nasal respiratory epithelium of ferrets comparative pathogenesis of covid-19, mers, and sars in a nonhuman primate 1445 model muc5b 1448 promoter polymorphism and development of acute respiratory distress syndrome type 2 and 1452 interferon inflammation strongly regulate sars-cov-2 related gene expression in the 1453 airway epithelium transmission potential of sars-cov-2 in viral shedding observed at the university of 1457 fiji: an open-source 1460 platform for biological-image analysis reverse 1463 genetics with a full-length infectious cdna of the middle east respiratory syndrome 1464 coronavirus structural basis of receptor recognition by sars-cov-2 severe acute respiratory syndrome coronavirus infection of human ciliated airway 1470 epithelia: role of ciliated cells in viral spread in the conducting airways of the lungs a dynamic variation of pulmonary 1474 ace2 is required to modulate neutrophilic inflammation in response to pseudomonas 1475 aeruginosa lung infection in mice small molecule antipsychotic aripiprazole potentiates 1478 ozone-induced inflammation in airway epithelium sars-cov-2 entry 1481 factors are highly expressed in nasal epithelial cells together with innate immune genes high infectivity and 1484 pathogenicity of influenza a virus via aerosol and droplet transmission rapid reconstruction of sars-cov-1488 2 using a synthetic genomics platform characterization of mucins from cultured normal human tracheobronchial 1492 epithelial cells potent binding of 2019 novel coronavirus spike protein by a sars 1495 coronavirus-specific human monoclonal antibody structure, function, and antigenicity of the sars-cov-2 spike glycoprotein structural definition of a 1501 neutralization-sensitive epitope on the mers-cov s1-ntd proteolytic activation of the porcine epidemic 1505 diarrhea coronavirus spike fusion protein by trypsin in cell culture airborne transmission 1508 of severe acute respiratory syndrome coronavirus-2 to healthcare workers: a narrative 1509 review virological assessment 1512 of hospitalized patients with covid-2019 cryo-em structure of the 2019-ncov spike in the 1516 prefusion conformation genome composition and divergence of the novel coronavirus 1519 (2019-ncov) originating in china exposure to 1521 air pollution and covid-19 mortality in the united states characteristics of and important lessons from the 1524 coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72314 1525 cases from the chinese center for disease control and prevention an infectious cdna clone of 1529 sars-cov-2 imaging and clinical features of patients with 2019 novel 1532 coronavirus sars-cov-2 structural basis for the 1534 recognition of sars-cov-2 by full-length human ace2 junctional and allele-specific residues are critical for 1537 mers-cov neutralization by an exceptionally potent germline-like antibody strategy for systematic assembly of 1540 large rna and dna genomes: transmissible gastroenteritis virus model reverse genetics with a full-length 1544 infectious cdna of severe acute respiratory syndrome coronavirus structural basis for the neutralization of mers-cov by a human monoclonal 1548 antibody mers-27 a novel coronavirus from patients with pneumonia in china potent cross-reactive 1554 neutralization of sars coronavirus isolates by human monoclonal antibodies sars-cov-2 viral load in upper respiratory specimens of infected 1558 patients sars-cov-2 and sars-cov neutralization assays shows limited cross neutralization sars-cov-2 shows a gradient infectivity from the proximal to distal respiratory tract ciliated airway cells and at-2 cells are primary targets for sars-cov-2 infection present a reverse genetics system for sars-cov-2, which is then used to make reporter viruses to quantify the ability of patient sera and antibodies to neutralize infectious virus and to examine viral tropism along the human respiratory tract sequence, n gene, 3'utr, and a 25nt poly-a tail, was assembled under the control of a 1020 t7 promoter. two reporter viruses, one containing gfp and the other harboring, a gfp-1021 fused nluc gene, were generated by replacing the orf7 gene with the reporter genes. 1022 1023 seven genomic cdna fragments were digested with appropriate endonucleases, 1025 resolved on 0.8% agarose gels, excised and purified using a qiaquick gel extraction kit 1026 (qiagen). a full-length genomic cdna was obtained by ligating seven fragments in an 1027 equal molar ratio with t4 dna ligase (neb). we then purified the ligated cdna with 1028 chloroform and precipitated it in isopropanol. the full-length viral rna or sars-cov-2 1029 sgrna-n were synthesized using the t7 mmessage mmachine t7 transcription kit 1030 (thermo fisher) at 30℃ for 4h. the full-length sars-cov-2 transcript and sgrna-n 1031 were mixed and electroporated into 8×10 6 of vero e6 cells. the cells were cultured as 1032 usual in the medium for two to three days. the lumen, not in smg. (ciii-iv) h&e (iii) and dual-immunofluorescence staining using 1215 acetylated alpha tubulin (red) and anti-sars-cov-2 rabbit polyclonal antibody (green) 1216 (iv) from the trachea of a separate autopsy. related to figure 7b and s5di. (d) 1217regional distribution of sars-cov-2 rna from trachea to alveoli identified by rna-ish 1218 in one sars-2-cov autopsy lung (in i and ii, viral staining is red; in iii, viral staining is 1219 turquoise). rna-ish dual color images demonstrate sars-cov-2 rna and sftpc 1220 mrna (alveolar type 2 cell marker) localization in alveoli of a sars-cov-2 autopsy 1221 55 lung. sars-cov-2 (turquoise) was identified in a sftpc (red)-positive (iii, arrow) and a key: cord-327349-rxb6zfoc authors: au, lewis; boos, laura amanda; swerdlow, anthony; byrne, fiona; shepherd, scott t.c.; fendler, annika; turajlic, samra title: cancer, covid-19, and antiviral immunity: the capture study date: 2020-09-03 journal: cell doi: 10.1016/j.cell.2020.09.005 sha: doc_id: 327349 cord_uid: rxb6zfoc the sars-cov-2 pandemic has posed a significant challenge for risk evaluation and mitigation amongst cancer patients. susceptibility to and severity of covid-19 in cancer patients has not been studied in a prospective and broadly applicable manner. capture is a pan-cancer, longitudinal immune profiling study designed to address this knowledge gap. the global oncology community are faced with unprecedented challenges in the management of cancer patients in the context of the severe acute respiratory syndrome coronavirus 2 (sars-cov-2) pandemic. in order to safeguard cancer patients, broad 'selfshielding' policies and deviations from standard of care cancer management were implemented early in the pandemic to reduce the risk of viral transmission and severe covid-19 respectively. further, there was limited evidence to guide -or feasibility to implement -harm-minimisation strategies that account for the heterogeneity of the cancer population, the spectrum of cancer states (remission or progressive) and interventions (surgery, chemo-, immuno-, radio-, targeted-, hormonal-, cellular therapies, and bone marrow transplantation). these measures inevitably will have an impact on long-term cancer outcomes and are not sustainable. as such measures are eased nationally and globally (e.g. the recent blanket halting of self-shielding in the u.k.), there remains significant uncertainty regarding the ongoing risk of infection and severe disease in cancer patients given the risk of recurrent outbreaks of sars-cov-2 and uncertain vaccine effectiveness. this applies to both those who have not yet been exposed, who are likely to be in the majority due to selfshielding practices, and those who have been infected, given the uncertainty around longterm immunity to sars-cov-2 and the possibility of re-infection. therefore, questions on appropriate management of cancer patients will persist well into the future. another risk is born out of cancer patients' regular access to hospital services, close interactions with healthcare workers (hcws) and increased hospital-acquired (termed 'nosocomial') transmission. difficulties in decision making therefore lie in balancing between potential nosocomial and community exposure risks (e.g. regular travel) versus anti-cancer therapy benefits. data on quantified risks of sars-cov-2 infection and adverse covid-19 outcomes associated with regular clinic attendances and treatment scheduling for patients is urgently needed to weigh against treatment benefit. this includes varied scenarios, such as commencing immune checkpoint blockade (cpi) for stage iv melanoma, with prospects of durable disease control or even cure, or a 6-month adjuvant chemotherapy regimen for resected breast cancer with moderate relapse risks. as such, an epidemiological and immunological understanding of the interaction between the virus, the host, the cancer and its management is needed across the clinical spectrum. such understanding can then be placed in the context of the wider healthcare and public policy, to inform tailored harmminimisation strategies of sars-cov-2 and maximise cancer-specific survival. additionally, cancer populations are a crucial group of patients to inform a broader understanding of the immune response to sars-cov-2. inherent perturbations on cell subsets (e.g. lymphoid and myeloid malignancies), or therapy-induced impact on immune states (e.g. immune checkpoint blockade) may provide opportunities to understand contributions of distinct immune compartments and key regulators of the anti-sars-cov-2 response. to-date, the extent to which humoral or cellular responses contribute to effective immune protection, and how durable that immunity to sars-cov-2 is, remains unknown. herein, we aim to provide an overview of knowledge to-date of the clinical features of covid-19 observed in cancer patients, as well as potential impact of cancer and anti-cancer interventions on the immune response to sars-cov-2. we propose a comprehensive, longitudinal clinical outcomes and immune profiling program -the capture studydesigned to rapidly accumulate data that will contribute to our understanding of immunopathology and immune protection in cancer patients, placed in the context of an evolving pandemic. early reports of sars-cov-2 outcomes in cancer patients have largely come from piecemeal, retrospective cohorts of infected patients (supplementary table 1). conclusions on subgroups were drawn from small numbers of patients -for example, increased risks of death on immunotherapy and chemotherapy, was extrapolated from 6 and 17 patients respectively (zhang et al., 2020) . lung cancer was highlighted as a particular risk in early studies, but death rates from acute severe infections varied widely. large registry studies have provided a more granular understanding of clinical features associated with the risk of adverse outcomes. the uk coronavirus cancer monitoring project (ukccmp) (n=800) demonstrated mortality from covid-19 was associated with general risk factors of age, gender, and comorbidities in patients with active cancer (lee et al., 2020) . importantly, there was no significant impact on mortality for patients who were treated with chemo-, immuno-, hormonal-, targeted-, and radiotherapy within 4 weeks of covid-19 diagnosis. the covid-19 and cancer consortium (ccc19) (n=925) reported on cancer patients with mostly symptomatic covid-19 disease (kuderer et al., 2020) . the overall mortality rate was 13%. interestingly, a leading factor for elevated mortality risk was progressing cancer, defined as progressive disease compared with those in remission (no measurable disease). additionally, the covid-surg collaborative registry reported a 27.6% mortality rate among covid-19 positive patients undergoing cancer surgery compared with non-cancer surgery (collaborative., 2020) . the teravolt study (thoracic cancers international covid 19 collaboration registry for thoracic cancers) focused on covid-19 positive patients with thoracic cancers (non-small cell and small cell lung cancers, mesothelioma, thymic epithelial tumours, and carcinoid/neuroendocrine tumours of thoracic origin), and observed a high death rate of 33% (n=200) (garassino et al., 2020) (supplementary table 1). a single-centre study in new york specifically on patients with lung cancer reported smoking status as a correlate of severe covid-19 (luo et al., 2020b) , and anti-pd1 treatment did not impact covid-19 severity (luo et al., 2020a) . the association of smoking and poor outcomes was also reflected in a large pan-cancer analysis of 423 patients with symptomatic covid-19 (robilotti et al., 2020) . in contrast, treatment with cpi was a significant risk factor for severe outcomes of covid-19 in this study. the varied conclusions drawn to-date on cpi risks further highlight the need to account for tumour types and other variables, when considering potential impact of anti-cancer interventions on covid-19 outcomes. collectively, these studies have begun to shed light on the epidemiological and clinical vulnerabilities in cancer patients who succumb to or endure poor outcomes of sars-cov-2. however, what has been critically missing in cohort and registry reports to date are data on 1) the true prevalence of sars-cov-2 infection in the cancer population, given population screening has not been widely implemented; and 2) the experience of those who remain well (uninfected, asymptomatic or subclinically affected), to determine the drivers of mortality and the absolute risks of severe adverse events within the cancer community as a whole. true clinical status (infected, negative, convalescent) for the great majority of cancer patients has not been established and hence true population prevalence is unknown. tracking of consecutive patients is also needed, to provide insights beyond a 'snapshot', as infection status for some will change over time. addressing these knowledge gaps require a comprehensive, longitudinal study, designed to contemporaneously capture data of both infected (encompassing all phenotypes) and non-infected patients, and followed over time, to reflect the experience of all patients with cancer (the 'denominator'). further, assessment of immune response features in response to sars-cov-2 in both these patient groups can inform potential mechanisms that may underpin the observed covid-19 clinical outcomes and correlations to specific features. there are multiple, varied mechanisms by which cancer, anti-cancer therapies, and other immune-modulating agents commonly used in cancer patients (e.g. steroids) modulate host immunity ( figure 1a -b). the extent to which this impacts anti-sars-cov immune responses (innate or adaptive) and the clinical course of covid-19 in cancer patients warrant dedicated evaluation. montopoli and colleagues have observed prostate cancer patients receiving androgen deprivation therapy (adt) may be protected against sars-cov-2 infections (montopoli et al., 2020) . indeed, adt used in prostate cancer down-regulates tmprss2, the serine protease for s protein priming for cell entry by sars-cov-2. this is of particular importance, given this population generally associates with high-risk demographic features (i.e. male, advanced age) for adverse events with covid-19. preclinical studies suggest low-dose radiation can induce polarisation of macrophages toward the m-2 anti-inflammatory phenotype, and reduce il-1 and tnf-alpha target cells producing il-6 (lara et al., 2020). as such, thoracic radiation therapy may potentially attenuate pulmonary hyperinflammation characteristic of severe sars-cov-2 infections, where there is increase of inflammatory monocyte-derived macrophages, accompanied by elevated levels of ccl2 and ccl7, both critical for monocyte recruitment. peripheral antiviral response to sars-cov-2 in cancer patients may be compromised due to their malignancy or as a result of anti-cancer treatments. haematological malignancies can have specific defects in lymphoid (b-and t-cells) and myeloid lineages. patients with advanced, solid malignancies are characterised by persistent systemic hyperinflammation with elevated cytokines by virtual of uncontrolled disease, and cytotoxic chemotherapy and surgery are associated with lymphopenia and general immunosuppressive states. in covid-19, lymphopenia, elevated levels of circulating cyto/chemokines, and inflammatory markers (crp, d-dimers) are characteristic peripheral features. in particular, lymphopenia is a hallmark of covid-19 especially in severe disease, with reduced levels of circulating nk cells, cd8 + t-cells, and to a lesser extent cd4 + cells. immune modulating agents, whether as adjunct or directed treatments in cancer may have distinct implications in covid-19. granulocyte colony-stimulating factor (g-csf) is commonly used in patients receiving myelosuppressive chemotherapy regimens to minimise risks of bacterial infections associated with neutropenia. however, the potential risks of hyperneutrophilic states in the context of sars-cov-2 infections are unknown, especially where respiratory compromise is mediated via tissue hyperinflammation. anti-cd20 antibodies are used in a variety of haematological malignancies to achieve b-cell depletion, potentially impacting generation of humoral immunity to sars-cov-2. glucocorticoids are often used for cancer patients, as adjunct treatment (i.e. premedication for chemotherapy), treatment for oncological emergencies (i.e. spinal cord compression), or directed towards the malignancy itself (i.e. for leukaemias and lymphomas). preliminary data from the recovery (randomised evaluation of covid-19 therapy) trial has demonstrated mortality benefits for administration of dexamethasone in patients requiring mechanical ventilation with established hyperinflammation (university of oxford, 2020). in light of this, a pertinent question would be whether steroid exposure earlier in the disease course blunts viral clearance or impairs mounting of an effective immune response, portending a worse outcome for patients with cancer. a longitudinal understanding of the degree to which the immunocompromised states of cancer patients impact infection, viral clearance, clinical course of covid-19, and subsequent generation of long-term immunity is needed. additionally, de novo viral mutational changes within the host are a pertinent feature of rna viruses, and evaluation of viral evolution in infected patients who may not mount an optimal immune response will be of importance. the intersection of the pd1/pd-l1 axis with viral response is of exceptional relevance in the cancer population, given cpis are associated with durable responses or even cure in multiple metastatic tumour types, and are used in the adjuvant setting with curative intent. is checkpoint blockade a friend or foe in the context of covid-19? cpi reverses immunosuppressive states by reinvigoration of exhausted t-cells for antigen recognition. conceivably anti-pd-1 therapy could reinvigorate exhausted t-cells during viral infection to suppress viral dissemination. conversely, it is the overactivation of t-cells contributing to hyperinflammation and tissue damage that underpins the lung pathology and morbidity in covid-19, and in this phase cpi could potentially be harmful. additionally, nk and t-cells in covid-19 patients show increased activation and exhaustion markers (pd-1 and tim-3), and t-cell activation and antibody response with increased titres of neutralising antibodies are closely related (grifoni et al., 2020) . lastly, little is known about acute viral infections whilst on anti-pd1 therapy. memory t-cells, which are also pivotal in recognition of infection, may clonally expand intratumorally but differentiation from naive to memory t-cells seems to be reduced with anti-pd1 blockade (wei et al., 2019) . this may impair the adaptive response to sars-cov-2 infection. patients receiving cellular therapeutics (stem cell transplantation, chimeric antigen receptor t-cells, or tumour-infiltrating-lymphocytes) represent a clinically distinct patient group that potentially renders them most vulnerable to sars-cov-2. these patients are both faced with secondary immunodeficiencies due to high-dose cytotoxic and immunosuppressive regimens, and also treatment complications related to immune dysregulation or hyperinflammation including graft-versus-host-disease and cytokine-release-syndromes. further factors to consider are superimposed nosocomial and opportunistic infection risks from extended hospital admissions, not limited to covid-19. finally, it is equally important to understand how covid-19 impacts anti-cancer immune response, to consider both detrimental and beneficial effects. the concept of oncolytic virotherapy was born out of serendipitous observations between cancer regression and viral infections or immunisations, leading to treatments such as talimogene laherparepvec (t-vec) as a genetically modified herpes simplex virus used for management of malignant melanoma today. examples of the varied mechanisms by which host immunity (innate or humoral/cellular adaptive immunity) is altered, with potential influence on the immune response as well as immunopathologic hyperinflammation of covid-19. sact, systemic anticancer therapy; rx, other anti-cancer treatments including radio-, hormonal-, targeted-, and cellular therapy; adt, androgen deprivation therapy; car-t, chimeric antigen receptor t-cell. the capture study capture (covid-19 antiviral response in a pan-tumour immune monitoring study) is a prospective, longitudinal study of cancer patients and hcws, established in response to the unique challenges of the sars-cov-2 pandemic for the care of cancer patients (https://www.royalmarsden.nhs.uk/capture-covid-19-antiviral-response-pan-tumourimmune-study). the overarching aim is to establish a prospective and unbiased understanding of the susceptibility to and morbidity of sars-cov-2/covid-19 in cancer patients and the patterns of nosocomial viral transmission. such understanding could inform clinical decision making and healthcare policy, especially advice on self-shielding, safe delivery of cancer therapy and reduction of transmission. first, to achieve an accurate picture of the true incidence and prevalence of sars-cov-2 we screen all patients longitudinally, irrespective of their clinical presentation: symptomatic, asymptomatic, and convalescent. second, to facilitate a broad understanding of the impact of cancer and cancer therapies on the course of sars-cov-2, we include patients from across a range of cancer types and cancer interventions. third, we follow the patients up long-term to understand the extent and duration of immunity, the impact of immune-modulating therapies (including antivirals and vaccines), incidence of re-infection, and long-term sequelae of sars-cov-2, including the impact on cancer outcomes. potential re-infections will be defined as testing positive for sars-cov-2 following convalescence and prior evidence of viral clearance. through comprehensive blood sampling we aim to understand the correlates of immune pathology and protection, and if and how these are impacted by inherent or iatrogenic immune defects in cancer patients and the host characteristics such as hla haplotype and germline variation (figure 2) . we seek to elucidate differences in immune response among cancer patients, incorporating comparative analyses between infected and uninfected patients, and also compared to non-cancer controls within the study and with published data in non-cancer patients. where measures of incidence and prevalence will reflect evolving public health plans, the immunological findings will be generalisable with respect to impact of cancer and therapy on covid-19 outcomes. finally, while we are aiming to detect accurate biological signals through detailed profiling, no one institution will be able to accrue sufficient patient numbers for detailed subgroup analyses. therefore, we are expanding to include other centres in the u.k., and would welcome other centres internationally to join this effort. an important question with regard to sars-cov-2 infections in cancer patients is the extent to which it is due to nosocomial exposure, either from other patients or from hcws, and the extent to which this affects outcomes. related recruitment within capture of hcws will achieve an accurate picture of incidence/prevalence as well as long-term immunity and reinfection in this group. critically, through viral lineage studies of patient and associated hcws we aim to understand nosocomial infection and place it in the wider context of community and hospital patterns of transmission. a key issue in study design in the context of a rapidly evolving pandemic is that pertinent research questions and hypotheses will emerge over time. an important element of the design of capture, therefore, has been to collect data and biological samples that both address immediate questions around safe provision of cancer care and informs an understanding of antiviral responses in the context of cancer, and also future proofs for additional opportunities, analyses, and collaborations. a pertinent example will be for monitoring of immune responses to future sars-cov-2 vaccines in patients with cancer within this adaptable, longitudinal framework. we commenced recruitment for capture on 4th may 2020 at the royal marsden nhs foundation trust and are expanding recruitment to include other sites in the u.k. three months since study opening, 227 patients and hcws have been enrolled, with 536 swab and 448 blood specimens collected longitudinally. the capture study management group is composed of a broad team of multidisciplinary clinicians with detailed input from patient representatives working group about the overall approach, and the merit of research to cancer patients. the laboratory studies will take place at the francis crick institute, utilising a clinically accredited sars-cov-2 diagnostic pipeline, and leveraging significant infection and immunity expertise and crick's status as a worldwide influenza centre with current active covid-19 research (https://www.crick.ac.uk/research/covid-19). patients are recruited into study arms a or b and followed-up schedules are bespoke to their covid-19 status and accounts for their clinical visit/treatment requirements; while hcws are recruited and followed-up according to schedule as outlined. comprehensive demographic, epidemiological and clinical data will be collected along with swabs and blood samples. diagnostic assays will be performed for all participants and will be inform selection of samples for detailed immune monitoring and next generation sequencing. standard operating procedure documents (for local sample collection, processing, and storage) will be implemented in additional institutions, and sample analyses will be centralised to ensure standardisation. an extensive clinical case report form (crf) is being used covering 34 categories including demographic data; data on concurrent medication and co-morbidities; details of the clinical course and outcome of sars-cov-2 (including need for hospital admission, respiratory and other organ support, and full range of laboratory investigations); and cancer-specific data, including staging, prognostic information, previous and current anti-cancer interventions and outcomes, and to facilitate analysis of interactions between the susceptibility to and severity of sars-cov-2 infection and the presence of malignancy and cancer treatments. a quality of life assessment is included, which will shed light on the ongoing complications of covid-19. in order to capture more nuanced details regarding lifestyle and environmental factors and household risks, self-reported data are collected through a secure online questionnaire. given the widespread modification to cancer care through reduced hospital visits, frequency of blood test and imaging, and most critically modification of standard-ofcare therapies, we also aim to collect data on the impact of these measures on diagnosis, management and outcomes of cancer. specifically, our quality of life questionnaires and crf generally were designed to allow integrative analyses with other studies, including local and international efforts such as the cancer research u.k. covid-19 survey, ukccmp, ccc19, and the european society of medical oncology covid-19 registry for patients with cancer (esmo-cocare). a critical feature of capture is that it will continue follow-up as well as recruitment of new patients over the next 24 months on the back of changing risk of infection in the community and hospital, and ongoing changes in public health policy. self-shielding guidance in the u.k. is now lifted across the board in the same way that it was applied, and yet this is a very heterogeneous group of patients with different sets of vulnerabilities and in whom risk assessment will need to be individualised. the current policies also mean that prevalence in cancer patients is likely still lower than in the general population, and thus a greater number of cancer patients will be at risk in the coming months. an important risk is the reliance of cancer patients on ongoing access to healthcare provision and frequent hospital attendance. for patients undergoing complex cancer surgery, high-dose chemotherapy, and adoptive cellular therapy the length of hospital stay can run into many weeks, while for others frequent administration of intravenous drugs may mandate weekly visits to the hospital. hcws caring for cancer patients are a key group in which we need an understanding of the incidence of asymptomatic infection, "silent" transmission and the risk of re-infection following initial exposure. large scale study of closely related and co-located hcws and patients participating in this study, will help to define transmission patterns across the hospital, highlighting the opportunities for interventions and policy change. longitudinal collections of viral swabs paired with comprehensive blood sampling (for analyses including multi-assay serology, cyto/chemokine profiling, immunophenotyping on peripheral blood mononuclear cells, tcr/bcr sequencing, and germline analyses) will give maximal scope for detailed mapping and monitoring of immune responses to sars-cov-2. the longitudinal collection of viral swabs in capture will enable determination of the duration of infection in those testing positive. pcr analysis for additional viruses (including other human coronaviruses [hcovs] and non-corona viruses) in symptomatic patients testing negative for sars-cov-2 coupled with serological analysis will allow evaluation for cross-reactivity and/or cross-immunity with non-sars-cov-2 viruses. viral genome sequencing and phylogenetic analysis in both cancer patients and hcw, analysed against the background of genetic drift in the virus, will define the nosocomial transmission patterns, which has been reported to define transmission clusters even in a single institution (rockett et al., 2020) . early reports posit that cancer patients have lower seroconversion rates -that is, decreased ability to generate detectable levels of specific anti-sars-cov-2 antibodies following viral exposure -than controls (solodky et al., 2020) . additionally, durability of immunity is entirely unexplored. extended follow-up and longitudinal sampling within capture afford the opportunity to evaluate both sufficiency and durability of immune responses in cancer patients and in the context of a potential future vaccine. antibody responses will be monitored for s-binding igg, igm and iga, which will also help to investigate the crossreactivity of antibodies raised in response to other hcov. results will be supplemented by elisa against viral s1 subunit. seroconverted patients will further be monitored by neutralising antibody assays to obtain a precise measure of the biological activity of detected antibodies. we will analyse t-cell activation in sars-cov-2 positive and negative patients and compare those with the responses in control groups to identify impairments in adaptive immunity related to the underlying cancer and associated cancer therapies. this will allow us to explore t-cell activation and antibody response relationships, which will be supplemented by detailed immunophenotyping, rna sequencing of peripheral blood, as well as tcr and bcr sequencing to draw a detailed picture of both innate and adaptive immune response in cancer patients. germline profiling, including hla typing and the detection of polymorphisms in immune-related genes will reveal genetic susceptibilities to severe covid-19 and will supplement the detailed analysis of immune response to obtain a comprehensive picture of susceptibility for each participant. the detailed analyses of the immune system in response to sars-cov-2 allows scope for biomarker discovery, with the aim of identifying cancer patients most vulnerable to infection. while it is clear that the increase of cyto/chemokines is a hallmark of a severe covid-19 late in the course of infection, what is needed are early indicators prior to clinical progression or infection for those destined for maladaptive hyperinflammation. prospective monitoring of cytokine kinetics in the general cancer population may allow detection of differences, which will likely be a multi-parametric set of indices, that may serve as baseline predictors for their response to sars-cov-2. j o u r n a l p r e -p r o o f sars-cov-2 exposure, infection, and cancer will continue to coincide in the foreseeable future, even if not on a pandemic scale, pending widespread implementation of a successful vaccine or pharmacotherapy. there is also a need to learn lessons for future similar pandemics. clinicians and policy makers alike need to be armed with a fundamental understanding of the interaction between host immunity, the virus, cancer, and cancer treatments placed in the wider healthcare context in order to minimise harm and optimise cancer outcomes. prospective data on immune responses derived from large numbers of patients representative of the entire clinical spectrum, with a relevant denominator population are needed. in the same way that we have evolved to personalised medicine in the management of cancer, we must also adopt a bespoke approach to the management of sars-cov-2 in cancer patients, extending into the post-pandemic era, and grounded in evidence-based practice. mortality and pulmonary complications in patients undergoing surgery with perioperative sars-cov-2 infection: an international cohort study covid-19 in patients with thoracic malignancies (teravolt): first results of an international, registry-based, cohort study targets of t cell responses to sars-cov-2 coronavirus in humans with covid-19 disease and unexposed individuals clinical impact of covid-19 on patients with cancer (ccc19): a cohort study low dose lung radiotherapy for covid-19 pneumonia. the rationale for a cost-effective anti-inflammatory treatment covid-19 mortality in patients with cancer on chemotherapy or other anticancer treatments: a prospective cohort study impact of pd-1 blockade on severity of covid-19 in patients with lung cancers androgen-deprivation therapies for prostate cancer and risk of infection by sars-cov-2: a population dexamethasone reduces death in hospitalised patients with severe respiratory complications of covid-19 determinants of covid-19 disease severity in patients with cancer revealing covid-19 transmission in australia by sars-cov-2 genome sequencing and agentbased modelling lower detection rates of sars-cov2 antibodies in cancer patients versus symptomatic covid-19 pd-1 silencing impairs the anti-tumor function of chimeric antigen receptor modified t cells by inhibiting proliferation activity clinical characteristics of covid-19-infected cancer patients: a retrospective case study in three hospitals within wuhan we thank the capture trial team, including eleanor carlyle, kim edmonds, and lyra del rosario, as well as somya agarwal, hamid ahmod, ravinder dhaliwal, lauren dowdie, lucy holt, justine korteweg, charlotte lewis, karla lingard, mary mangwende, aida murra, kema peat, sarah sarker, nahid shaikh, sarah vaughan, and fiona williams. we also thank mike gavrielides for infomatics support, the volunteer staff at the francis crick institute, antonia toncheva, karolina rzeniewicz, and nicole neuman for editorial assistance. due to limitations on cited references and the pace at which the field is evolving, we acknowledge researchers in covid-19, particularly in furthering our understanding of clinical correlates and immune responses in patients with cancer. the capture study is sponsored by the key: cord-301997-63160t7f authors: schwer, beate; visca, paolo; vos, jan c.; stunnenberg, hendrik g. title: discontinuous transcription or rna processing of vaccinia virus late messengers results in a 5′ poly(a) leader date: 1987-07-17 journal: cell doi: 10.1016/0092-8674(87)90212-1 sha: doc_id: 301997 cord_uid: 63160t7f abstract we have demonstrated by primer elongation and cap analysis that mature vaccinia virus late transcripts are discontinuously synthesized. we have shown that rna transcripts from a translocated 11k and from the authentic 11k and 4b late promoters are extended by approximately 35 nucleotides beyond the “start site” determined by s1 mapping using vaccinia genomic dna as a probe. sequencing of the rna and of the first strand cdna reveal that a homopolymeric poly(a) sequence is linked to the 5′ terminus of the rna transcripts. s1 mapping of rna transcripts with a dna probe containing an a-stretch, replacing promoter sequences upstream of position −1, confirms the existence of a poly(a) leader of approximately 35 a-residues. transcription or rna processing of vaccinia virus late messengers results in a 5' poly(a) leader beate schwer, paolo vista, jan c. vos, and hendrik g. stunnenberg european molecular biology laboratory meyerhostraj3e 1 6900 heidelberg f. ft. g. we have demonstrated by primer elongation and cap analysis that mature vaccinia virus late transcripts are discontinuously synthesized. we have shown that rna transcripts from a translocated 11k and from the authentic 11k and 4b late promoters are extended by approximately 35 nucleotides beyond the "start site" determined by sl mapping using vaccinia genomic dna as a probe. sequencing of the rna and of the first strand cdna reveal that a homopolymeric poly(a) sequence is linked to the 5' terminus of the rna transcripts. sl mapping of rna transcripts with a dna probe containing an a-stretch, replacing promoter sequences upstream of position -1, confirms the existence of a poly(a) leader of approximately 35 a-residues. transcription of the cytoplasmic vaccinia virus can be divided into two phases: an immediate early/early phase starting shortly after the infection of the cell, and a late phase starting with the onset of dna replication (2-5 hr after infection). the mechanisms of the temporal regulation of gene expression, i.e., the switching from early to late transcription, are unknown. the early genes are characterized by the presence of an untranslated leader sequence and their transcripts have discrete 3' ends (venkatesan et al., 1961; yuen and moss, 1986) . in contrast, late genes appear to lack an untranslated leader as well as termination signals at their 3' ends, which results in readthrough by rna polymerase. furthermore, early termination signals are not recognized in the late phaseof infection (smith et al., 1984) . basic promoter sequence elements such as a tata box or caat sequence are not present in vaccinia promoters. the vaccinia early and late promoters are not recognized by prokaryotic or eukaryotic rna polymerases (smith et al., 1984) . vaccinia messengers appear to have a capped 5' end consisting of a fmethyl-guanosine (m7g) residue (wei and moss, 1975; urushibara et al., 1975) and they have an a-tail at their 3' ends (nevins and joklik, 1975) . the regulatory signals controlling the transcription and the time of gene activation reside in very short stretches of approximately 20 to 30 bp. this has been shown by translocation of promoter fragments by means of homologous recombination (cochran et al., 1985b; rose1 and moss, 1985; hanggi et al., 1986) . vaccinia late promoters are characterized by the pres-ence of a highly conserved taaat motif that overlaps the site of transcription initiation as determined by sl map ping (plucienniczak et al., 1985; hanggi et al., 1986) . the functional analysis of the vaccinia late promoter of the 11 kd basic polypeptide revealed that the sequences from position -29 to +8 (+l is arbitrarily defined as the a-residue of the aug) are sufficient for transcriptional activity (hanggi et al., 1986) . this was shown through the insertion of a chimeric gene consisting of the wild-type 11k promoter up to position +8, and the coding region of the mouse dihydrofolate reductase (dhfr) gene into the vaccinia thymidine kinase (t/r) gene by homologous recombination (panicali et al., 1982; mackett et al., 1982) . we have shown that mutations within the conserved taaat motif result in complete inactivation of promoter activity. we further show that mutations of sequences surrounding the taaat motif either have no effect or increase the overall promoter strength. the mutated regions of the translocared fragment fnclude the start codon of translation; this sequence is partially conserved as reflected in the consensus sequence taaat (hanggi et al., 1986) . the aug start codon of translation is either part of the taaat motif or immediately adjacent to this element. this paper concerns the study of the structure of the 5' terminus of vaccinia virus late rna transcripts. we demonstrate that the transcripts of both wild-type and translocated promoters are discontinuously synthesized and obtain a poly(a) leader sequence. we will discuss possible mechanisms involved in the synthesis of the discontinuous late transcripts. preliminary observations made in our laboratory demonstrated that mutations downstream of the taaat motif result in a 4-to b-fold increase in promoter strength as compared to the wild-type translocated promoter. in contrast, the efficiency of translation can be decreased more than lo-fold as compared to the wild-type translocated-dhfr construct. this phenomenon appears to be independent of the test gene cloned downstream of the mutated promoter (unpublished data). a detailed analysis of the different mutants will be presented elsewhere. the reduction of the translatability of these messengers indicates that the sequence at the 5' end of the mrna might be directly or indirectly involved in a translational control mechanism. we have therefore analyzed the structure and sequence at the 5' end of late vaccinia messengers in more detail. the site of transcription initiation of late vaccinia mrna has been determined thus far by sl mapping experiments using genomic dna as a probe (cochran et al., 1985b; rose1 and moss, 1985; bertholet et al., 1985; hanggi et al., 1986) . the reason for this is that primer elongation experiments are complicated by a high degree of complementarity in late vaccinia rna transcripts, both dna strands are transcribed, genes can be overlapping, and there is readthrough of the rna polymerase (plucienniczak et al., 1985; smith et al., 1984) . if, however, the mature messenger is discontinuously synthesized, sl mapping experiments can only reveal the putative junction site and not the 5' end of the mature messenger. we have been able to avoid a high nonspecific background in primer extension experiments by end-labeling the synthetic oligonucleotide primers. the results of such primer extension experiments using different gene internal primers and rna derived from the translocated 11k promoter-dhfr gene construct are shown in figure 1 . if the 5' end of the rna as determined by sl mapping (hgnggi et al., 1986) represents the genuine 5'end of the mature transcript, we would obtain a primer elongated cdna migrating at the position indicated by the arrows. however, the majority of the cdna products are extended by approximately 35 bases beyond the sl "start site" independent of the position of the synthetic oligonucleotide ( figure 1 , lanes 2-4). as a negative control, we have performed primer extension with a dhfr primer and wild-type mrna that does not contain dhfr sequences ( figure 1, cdna-rna hybrids were incubated in the presence or absence of rnaase a at a final concentration of 10 kg/ml and the hybrids were cap-selected by immunoprecipitation using a rabbit anti-m7g antiserum as described. (wt) primer extension without cap selection. (-) cap selection without rnase treatment. (+) treatment with rnaase a prior to cap selection. (m) [32p]labeled hpall-digested pbr-322 dna size markers. the position of the taaat motif representing the sl "start site" is indicated by the arrow. primer extension does not coincide with sl mapping indicates that the transcripts are discontinuous and that a leader rna is linked to the transcripts. on a long exposure a faint band migrating at the position of the sl "start site" is detectable (not shown). furthermore, very long cdnas are obtained, which might represent readthrough transcripts, and cdnas shorter than the sl "'start site:' which are probably premature stops of the reverse transcriptase. the 5' end of mature mrna is characterized by the presence of a cap structure consisting of a fmethyl-guanosine (m7g) residue linked with a triphosphate bridge to the rna. furthermore, the adjacent residue (in general g or a) is methylated at the 2'-0-ribosyl position (banerjee, 1980) . vaccinia messenger rnas also have a m7g cap rna from cells infected with the llk-dhfr recombinant virus was primer extended using end-labeled oligonucleotides for the dhff, the authentic iik, and 4b genes. lanes 1 and 2: dhfr transcripts with primers at position +24 to +46, respectively, +66 to +113; lanes 3 and 4: 11k transcripts with primers at position +14 to +33, respectively, +52 to +7l; lane 5: 4b transcripts with a primer at position +lo to +29. the positions of the taaat motif representing the sl "start sites" are indicated by the arrows. (m): [*pi-labeled hpall-digested pbr-322 dna size markers. structure (wei and moss, 1975; urushibara et al., 1975) . in order to prove that the rna transcripts containing the discontinuous leader sequence are mature messengers with a cap structure, we used a polyclonal antiserum against the m7g cap (munns et al., 1982) to immune-precipitate the cdna-rna hybrids. thecdna-rna hybrids from the 11k translocated promoter, which are extended beyond the sl start site (indicated by a bar), are selectively enriched by the immunoprecipitation (figure 2 , lane -). some additional smaller cdna fragments are also retained, as are some readthrough products. the cdna band migrating at the position of the sl "start site" is not precipitated by the antibodies, indicating that this rna species does not have a cap structure. the length of the "extension" appears to vary within 30-40 nucleotides. this heterogeneity is not due to premature stops of the reverse transcriptase: the cdna-rna hybrids can be retained by the antibody column after treatment with rnaase a prior to cap selection. in this experiment, the cdna-rna hybrids can only be retained by the antibody column if the rna including the cap structure is protected from rnaase digestion by a full-length cdna. , the result shows that the rna transcripts extended be-~. y yond the sl "start site" are protected from the rnaase treatment by the cdna (figure 2 , lane +). we conclude that these transcripts represent mature capped messengers and are not the result of premature stopping of the reverse transcriptase. the smaller cdna-rna hybrids as well as the majority of the very large cdna-rna hybrids are sensitive to rnaase treatment ( figure 2 ). this rnaase sensitivity shows that these cdnas represent premature stops of the reverse transcriptase and not full-length cdnas. the full-length cdna-rna hybrids are also insensitive to digestion with a combination of the rnaases a, t1, and cl3 prior to cap selection (results not shown). the rnaase cl3 has a preference for cpln but cleaves also apln bonds under the test conditions (levy and karpetsky, 1980) . we have demonstrated that transcripts from the 11k translocated promoter-dhfrgene construct are discontinuously synthesized with a leader rna of approximately 35 bases 5' of the sl "start site." the question now arises as to whether this is a general phenomenon of vaccinia late transcription. we have performed primer extension with synthetic oligonucleotides of two wild-type vaccinia mrnas coding for the authentic 11k basic polypeptide (wittek et al., 1984) and for the structural protein 4b (rose1 and moss, 1984) . the results clearly demonstrate that both the authentic 11k (figure 3, lanes 3 and 4) and the 4b ( figure 3 , lane 5) messengers are extended by approximately 35 nucleotides beyond the si "start site" as we have shown for the translocated 11k messengers (figure 3, lanes 1 and 2) . after rnaase treatment-cap selection of the cdna-rna hybrids, we obtained the same results as for the wild-type translocated promoter-dhfr messengers (data not shown). the covalently linked leader rna was sequenced by primer elongation in the presence of dideoxynucleotides using an end-labeled synthetic oligonucleotide. these experiments were again complicated by the presence of relatively high amounts of readthrough transcripts which initiate at upstream promoters. we therefore obtained two different rna sequences: the promoter sequence which is present in the readthrough transcripts and the sequence of the discontinuous transcripts initiated at the respective promoter upstream of the putative junction. downstream of the junction we obtained the uniform sequence of the coding body of the gene. the sequence of the rna transcripts from the 11k translocated ( figure 4a ) and both wild-type 11k ( figure 48 ) and 4b (not shown) promoters show a stretch of minimally lo-15 t-residues in the complementary strand upstream of the aug translation start codon. this sequence is present neither in the genomic sequence of the different promoters nor immediately upstream of them. we also observed on longer exposures promoter sequences present in the readthrough transcripts which are initiated at promoters further upstream. the readthrough is, however, more prominent in transcripts from the translocated promoter construct within the tk locus than in the authentic 11k transcripts located in the left-hand side of the genome. sequencing of the first strand cdna the rna sequencing revealed the presence of an a-stretch of at least lo-15 nucleotides 5' of aug start codon. we wished to determine the complete sequence of the cdna, but were unable to lower the dideoxynucleotide concentration without the introduction of unspecific ghost bands. furthermore, standard cdna cloning procedures (okayama and berg, 1982; gubler and hoffmann, 1983) using vaccinia late mrna appeared to be, in our hands, highly susceptible to artifacts. we therefore decided to sequence the primer extended first strand cdnas from the translocated llk-dhfr gene construct and from the authentic 11k and 4b promoters by the method of maxam and gilbert (1980) . for this purpose the extended cdnas were extracted from denaturing gels and sequenced. in all three cases the sequence revealed a homopolymeric t-stretch of more than 20 nucleotides in the complementary strand 5' of the sl "start sites:' only the result of cdna sequencing of the translocated 11k promoter transcripts is shown ( figure 5 ). we obtained bands in all four lanes, with the first ten nucleotides indicating the end of the cdna. alter rna from cells infected with the ilk-dhfr recombrnant virus were primer extended using a 5'-labeled oligonucleotide (+24 to +46) as described in experimental procedures. the extended cdna products of 70-60 nucleotides in length were isolated from a denaturing polyacrylamide gel. the cdna was sequenced using the method of maxam and gilbert (1962 natively, the cleavage in all four lanes might be due to misincorporations of the reverse transcriptase, which has been reported to occur following homopolymeric stretches (murphy et al., 1986) . we have analyzed thus far the 5' end of the mrna in an indirect manner using reverse transcriptase. the obtained results now enable us to construct an artificial sl probe complementary to the discontinuous in vivo mrna in which a stretch of 80 a-residues replaces promoter sequences upstream of position -1 of the translocated llk-dhfr gene construct and of the authentic 11k gene ( figures 6a and 6b, lane 2) . as a control we have performed sl mapping using genomic vaccinia dna as a probe ( figures 6a and 68 , lane 1). the 3' ends of these genomic sl probes consist of nonhomologous plasmid sequences. this allows the separation of the input dna probe from the sl fragments protected by readthrough transcripts. the digestion with nuclease sl was performed at low temperature to minimize nibbling of the nuclease at the 5' end of the rna. the results shown in figures 6a and 6b (lane 2) confirm the presence of the poly(a) stretch of approximately 35 nucleotides at the 5' end of the mrna. the si protected fragments of 350, respectively 290, nucleotides in length, which occur after mapping with the artificial dna probe containing the 5' a-stretch ( figures 6a and 6b, lane 2) are mapping at the position of the sl "start site" and correspond to readthrough transcripts that do not have an a-stretch at the 5' end. we observed the same percentage of transcripts initiated at the translocated 11k promoter as compared to readthrough transcripts after sl mapping using either the genomic or the artificial sl probes ( figure 6a , compare the band migrating at 465 in lane 1 with the band at 350 nucleotides in lane 2). this indicates that the majority of the transcripts initiated at the translocated promoter contain the a-stretch at their 5'ends. readthrough of the rna polymerase is again more prominent at the t/c locus in the middle of the genome than at the location of the authentic 1lk gene in the left-hand side of the genome. we have observed the same phenomenon in the rna sequencing experiments ( figures 4a and 48) . we conclude that vaccinia late mrnas are discontinuously synthesized with a poly(a) stretch 5' of the sl "start site." unusual mechanisms of gene transcription have been reported for corona virus (spaan et al., 1962) and trypanosomes (murphy et al., 1966; sutton and boothroyd, 1966) . vaccinia virus, or the poxviruses in general, might also have developed unique mechanisms as a consequence of their cytoplasmic location. the virus does in fact have its own transcription and replication machinery (moss, 1976) . furthermore, recent studies concerning the functional analysis of viral promoters confirm the earlier observation that they possess unique features that are not found in their eukaryotic counterparts (cochran et al., 1985b; rose1 and moss, 1985; hanggi et al., 1986) . the observation that mutations within the conserved sequence of the translocated 11k promoter can result in a strong reduction of translation without affecting transcrip tion lead us to analyze the structure of the 5' end of the messengers. the possibility that the sequences following the conserved taaat motif might be involved in maturation of the rna transcripts is further suggested by the fact that there is a sequence conservation downstream of the taaat motif (hanggi et al., 1986) . we have shown by primer elongation that the 5' end of the mature late messengers does not coincide with the site of transcription initiation as determined by sl mapping (figures 1 and 3 ). an rna leader sequence of ap proximately 35 bases is linked to rna transcripts originating from the 1lk translocated promoter and from the authentic 11k and 4b promoters. we have shown by cap selection using a rabbit anti-mrg antiserum that the extended transcripts represent mature mrnas with an m7g cap structure at their 5'terminus. treatment of the cdna-rna hybrids obtained after primer extension with different rnaases prior to cap selection did not remove the cap structure ( figure 2 ). this demonstrates that we are obtaining full-length cdna and not premature stops of the reverse transcriptase within the discontinuous leader rna. sequencing of the rna transcripts and of the first strand cdna revealed that the leader rna consists of a homopolymeric stretch of at least 20 a-residues (figures 4 and 5) . the presence of other bases in front of the a-stretch cannot formally be excluded on the basis of the sequencing data. it has been described that reverse transcriptase can misincorporate nucleotides after homopolymeric stretches (murphy et al., 1966) . this phenomenon could explain the observed cleavage in all four lanes in the 5' end of the cdna sequence ( figure 5 ). however, sl mapping of rna transcripts with an artificial dna probe containing a stretch of 80 a-residues upstream of position -1 replacing promoter sequences confirms the presence of a poly(a) stretch of approximately 35 nucleotides ( figures 6a and 66 late transcripts is a general phenomenon. in contrast, early transcripts are not discontinuous; the s'end determined by sl mapping coincides with the 5' end of the cdna clones (venkatesan et al., 1981) . the data that we have presented are to some extent ambiguous with respect to the length of the poly(a) stretch. reverse transcriptase experiments point in the direction of a heterogeneity in the length of the a stretch of 5-10 nucleotides (figures l-5) which is not observed in the sl experiment ( figure 6 ). this discrepancy might be due to an inability of the reverse transcriptase to read up to the methylated g-residue of the cap structure. furthermore, homopolymeric stretches are relatively poorly transcribed even at high enzyme concentrations (murphy et al., 1986) . the rnaase-cap selection experiments cannot completely rule out this possibility because the poly(a) sequence might be less susceptible to the rnaases under the test conditions. the exact length of the poly(a) stretch in the leader rna can probably only be determined after the elucidation of the mechanism of poly-(a) addition. the results that we have presented here do not elaborate on the mechanism for the addition of the a-stretch to late transcripts. cis splicing seems to be unlikely because vaccinia promoter-gene constructs can be expressed and their transcripts translated in a transient assay in which the chimeric gene is not integrated into the viral genome (cochran et al., 1985a; our unpublished observations). we are currently investigating whether these transcripts have a poly(a) leader sequence. other mechanisms reported for synthesis of discontinuous transcripts, i.e., frans splicing in trypanosomes (murphy et al., 1986; sutton and boothroyd, 1986) and primed transcription with a small rna leader molecule in influenza virus (beaton and krug, 1981) or coronavirus (spaan et al., 1982; makino et al., 1986) might also apply to the synthesis of vaccinia late transcripts. the fact that we do not find a eukaryotic consensus splice acceptor site at or close to the putative junction of the discontinuous vaccinia transcripts does not exclude a frans splicing mechanism. vaccinia as a cytoplasmic virus has developed its own transcription machinery; the conserved taaat motif might represent the vaccinia counterpart of the eukaryotic splice acceptor site. the leader rna addition might also be the result of a processing mechanism based on the addition of preexisting and possibly capped poly(a) rna molecules. initial attempts to detect such molecules in vivo have proved unsuccessful. it has been reported, however, that purified vaccinia virions produce high levels of polyriboadenylic acids in vitro (bablanian and banerjee, 1986) . these authors postulated a role of the poly(a) rna in translation control and shut-off of host translation. they did not determine, however, whether these poly(a) rna molecules are present in vivo and whether they are capped. an addition of a poly(a) rna and a subsequent capping of the rna might also be possible since an rna-specific phosphate kinase activity is detectable in vaccinia virions (spencer et al., 1978) . furthermore, poly(a) polymerase and capping enzyme are reported to be present in purified vaccinia virions (baroudy and moss, 1980; wei and moss, 1974 rna of rk-13 or hela s3 cells infected with vaccima virus recombinants or the wild-type strain was extracted with guanidinium hydrochloride followed by cscl purification as described by maniatis et al. (1982) . sl mapping of the mrna and of the first strand cdna was performed according to maniatis et al. (1982) . the hybridization was performed at 44°c and the nuclease sl digestion at icc. primer elongation was performed using y-labeled synthetic oligonucleotides. the labeled oligonucleotide was incubated at 55% with the rna, slowly cooled down to 42% and coprecipitated with 0.6 volumes of iso-propanol in the presence of 0.6 m naac. ten units of reverse transcriptase and 20 u of rnaase inhibitor were used with 10 ug of total rna. in the rna sequencing experiments, the deoxy-to dideoxynucleotide ratios were 5.5, 1.4, 1.4, respectively 2.7, for t-, a-, g-, respectively c-reactions. the cdna-rna products from the primer elongation were phenol extracted and precipitated. the hybrids were incubated overnight at 4oc in binding buffer (10 mm tris-hci (pli 8.01, 150 mm naci, and 0.1% np40 with the polyclonal rabbit anti-m7g in the presence of 1 mglml of heparin (174,000 iulg). the nucleic acid-antibody complex was subsequently incubated for 1 hr at room temperature with protein a-sepharose in the same buffer containing 20 mglml of heparin. the beads were washed three times with binding buffer and twice with binding buffer containing 500 mm naci. the bound nucleic acids were removed by sds-proteinase k treatment, phenolized, precipitated, and separated on a sequencing gel. rnaase treatment of the cdna-rna hybrid was performed with rnaases a at a final concentration of 10 pg/ml in 10 mm tris-hci (ph 7.5) and 1 mm edta for 10 min at room temperature, the nucleic acids were phenol extracted, and the cdna-rna hybrids immune-precipitated as described. amv reverse transcriptase and rnaase inhibitor were purchased from genofit. geneva, or boehringer mannheim. the rnaases a, t,. cl3 and restriction endonucleases were purchased from boehringer mannheim. protein a-sepharose cl-48 and nuclease sl were obtained from pharmacia, sweden, and heparin was obtained from serva, heidelberg. radioactive nucleotides were purchased from amersham. rk-13 cells were obtained from flow laboratories and the media for cell culture from gibco. poly (riboadenylic acid) preferentially inhibits in vitro translation of cellular mrnas compared with vaccinia virus mrnas: possible role in vaccinia virus cytopathology 5'-terminal cap structure in eukaryotic messenger ribonucleic acids purification and characterization of dna-dependent rna polymerase from vaccinia virions selected host-cell capped rna fragments prime influenza viral rna transcription in vivo one hundred base pains of 5'flanking sequence of a vaccinia virus late gene are sufficient to temporally regulate late transcription eukaryotic transient expression system dependent on transcription factors and regulatory dna sequences of vaccinia virus in vitro mutagenesis of the promoter region for a vaccinia virus gene: evidence for tandem early and late regulatory signals a simple and very efficient method for generating cdna libraries conserved taaat motif in vaccinia virus promoters: overlapping tata box and site of transcription initiation the purification and properties of chicken liver rnase veccinia virus: a selectable eukaryotic cloning and expression vector leader sequences of murine coronavirus mrnas can be freely reaasorted: evidence for the role of free leader rna in transcription molecular cloning: a laboratory manual sequencing end-labeled dna with base-specific chemical cleavages poxviruses antibody-nucleic acid complexes. lmmunospecific retention of globin messenger ribonucleic acid with antibodies specific for irmethylguanosine identification of a novel y branch structure as an intermediate in trypanosome mrna processing: evidence for trans splicing poly(a) sequences of vaccinia virus messenger rna: nature mode of addition and function during translation in vitro and in viw highefficiency cloning of full-length cdna construction of poxviruses as cloning vectors: insertion of the thymidine kinase gene from herpes simplex virus into the dna of infectious vaccinia virus transcriptional and translational mapping and nucleotide sequence analysis of a vaccinia virus gene encoding the precursor of the major core polypeptide 4b recombinant vaccinia virus as new live vaccines sequence relationship between the genome and the intracellular rna species i, 3, 6 and 7 of mouse hepatitis virus strain a59 enzymatic conversion of 5'-phosphate-terminated rna to 5'-di-and triphosphate terminated rna evidence for warts splicing in trypanosomes a modified structure at the 5'-terminus of mrna of vaccinia virus distinctive nucleotides sequences adjacent to multiple initiation and termination sites of an early vaccinia virus gene methylation of newly synthesized viral messenger rna by an enzyme in vaccinia virus methylated nucleotides block 5'-terminus of vaccinia virus messenger rna mapping of a gene coding for a major late structural polypeptide in the vaccinia virus genome multiple s'ends of mrna encoding vaccinia virus growth factor occur within a series of repeated sequences downstream of t clusters we thank claudio schneider for his advice on cap selection experiments and for his generous gift of anti-cap antibodies, our colleagues for critical reading of the manuscript, heide seifert for secretarial help, and jacky schmitt for excellent technical assistance.the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 18 u.s.c. section 1734 solely to indicate this fact. key: cord-345103-b2wkm03g authors: yao, hangping; song, yutong; chen, yong; wu, nanping; xu, jialu; sun, chujie; zhang, jiaxing; weng, tianhao; zhang, zheyuan; wu, zhigang; cheng, linfang; shi, danrong; lu, xiangyun; lei, jianlin; crispin, max; shi, yigong; li, lanjuan; li, sai title: molecular architecture of the sars-cov-2 virus date: 2020-09-06 journal: cell doi: 10.1016/j.cell.2020.09.018 sha: doc_id: 345103 cord_uid: b2wkm03g sars-cov-2 is an enveloped virus responsible for the covid-19 pandemic. despite recent advances in the structural elucidation of sars-cov-2 proteins, detailed architecture of the intact virus remains to be unveiled. here we report the molecular assembly of the authentic sars-cov-2 virus using cryo-electron tomography (cryo-et) and subtomogram averaging (sta). native structures of the s proteins in both preand postfusion conformations were determined to average resolutions of 8.7-11 å. compositions of the n-linked glycans from the native spikes were analyzed by mass-spectrometry, which revealed highly similar overall processing states of the native glycans to that of the recombinant glycoprotein glycans. the native conformation of the ribonucleoproteins (rnp) and its higher-order assemblies were revealed. overall, these characterizations have revealed the architecture of the sars-cov-2 virus in exceptional detail, and shed lights on how the virus packs its ∼30 kb long single-segmented rna in the ∼80 nm diameter lumen. as of august 31 st , 2020, a total of over 25 million cases of covid-19 were reported and more than 850 thousand lives were claimed globally (https://covid19.who.int). the causative pathogen, severe acute respiratory syndrome coronavirus 2 (sars-cov-2), is a novel βcoronavirus wu et al., 2020; zhou et al., 2020) . sars-cov-2 encodes at least 29 proteins in its (+) rna genome, four of which are structural proteins: the spike (s), membrane (m), envelope (e) and nucleocapsid (n) proteins (kim et al., 2020) . the ~600 kda, trimeric s protein, one of the largest known class-i fusion proteins, is heavily glycosylated with 66 n-linked glycans watanabe et al., 2020a; wrapp et al., 2020) . each s protomer comprises the s1 and s2 subunits, and a single transmembrane (tm) anchor (wrapp et al., 2020) . the s protein binds to the cellular surface receptor angiotensin-converting enzyme-2 (ace2) through the receptor binding domain (rbd), an essential step for membrane fusion (hoffmann et al., 2020; lan et al., 2020; shang et al., 2020; wang et al., 2020; yan et al., 2020; zhou et al., 2020) . the activation of s requires cleavage of s1/s2 by furin-like protease and undergoes the conformational change from prefusion to postfusion (belouzard et al., 2009; kirchdoerfer et al., 2018; simmons et al., 2004; simmons et al., 2013; song et al., 2018) . several prefusion conformations have been resolved for the s protein, wherein the three rbds display distinct orientations, "up" or "down" wrapp et al., 2020) . the receptor binding sites expose, only when the rbds adopt an 'up' conformation. the "rbd down", "one rbd up" and "two-rbd up" conformations have been observed in recombinantly expressed s proteins of sars-cov-2 (henderson et al., 2020; walls et al., 2020; wrapp et al., 2020) . upon activation, s follows a classic pathway among class-i fusion proteins (rey and lok, 2018) : it undergoes dramatic structural rearrangements involving shedding its s1 subunit and inserting the fusion peptide (fp) into the target cell membrane (cai et al., 2020) . following membrane fusion, s transforms to a needle-shaped postfusion form, having three helixes entwining coaxially (cai et al., 2020) . despite the efforts in elucidating the sars-cov-2 virus host recognition and entry mechanism at near-atomic resolution using recombinant proteins, highresolution information regarding the in situ structures and landscape of the authentic virus is in-demand. coronavirus has the largest genome among all rna viruses. it is enigmatic how the n protein oligomerizes, organizes, and packs the ~30 kb long single-stranded rna in the viral j o u r n a l p r e -p r o o f lumen. early negative-staining electron microscopy of coronaviruses showed single-strand helical rnps with a diameter of ~15 nm (caul and eggleston, 1979) . cryo-et of sars-cov revealed that rnps organized into lattices underneath the envelope at ~4-5 nm resolution (neuman et al., 2006) . however, such ultrastructure was not observed in the mouse hepatitis virus (mhv), the prototypic β-coronavirus (barcena et al., 2009) . no molecular model exists so far for the coronavirus rnp, and little is known about the architecture, assembly and rna packaging of the rnps of other (+) rna viruses. to address these questions, we combined cryo-et and sta for the imaging analysis of 2,294 intact virions propagated from an early viral strain (yao et al., 2020) . to our knowledge, this is the largest cryo-et data set of sars-cov-2 virus to date. here we report the architecture and assembly of the authentic sars-cov-2 virus. sars-cov-2 virions (id: zju_5) were collected on january 22 nd , 2020, from a patient with severe symptoms, and were propagated in vero cells. the patient was infected during a conference with attendees from wuhan (yao et al., 2020) . for cryo-em analysis, the viral sample was fixed by paraformaldehyde, which has minor effects on protein structure at 7-20 å resolution (li et al., 2016; wan et al., 2017) . intact in total, 56,832 spikes were manually identified from the virions, approximately 97% of which are in the prefusion conformation, 3% in the postfusion conformation (method details). an average of 26±15 prefusion s were found randomly distributed on each virion ( figures 1b-c) . the spike copy number per virion is comparable to hiv (liu et al., 2008) , but ~5 times less than the lassa virus (lasv) (li et al., 2016) or ~10 times less than the influenza j o u r n a l p r e -p r o o f virus (harris et al., 2006) . 18,500 rnps were manually identified in the viral lumen (table s1 ), giving an average of 26±11 rnps per virion. however, since the viral lumen is tightly packed with rnps and electron opaque, the actual number of rnps per virion was estimated to be 20-30% more, i.e. 30-35 rnps per virion. regularly ordered rnp ultrastructures were occasionally observed ( figure s5a ), indicating the rnps could form local assemblies. two conformations of the prefusion s, namely the rbd down and one rbd up conformations from inactivated sars-cov-2 virions were classified and reconstructed to 8.7 å and 10.9 å resolution by sta, with local resolution reaching 7.8 å ( figures s3a-s3c ). the heptad repeat 1 (hr1) and central helix (ch) domains of the s2 subunits represent the best resolved domains ( figure s3d ). the proportion of rbd down conformation among all prefusion s was estimated to be 54% per virion ( figure 1c ). the membrane proximal stalk of s represented the poorest resolved region with a local resolution of ~20 å, showing no trace of the tm or membrane in the structure ( figure 1d ). when the tomogram slices were scrutinized, spike populations that either stand perpendicular to or lean towards the envelope were observed, suggesting that the tm was averaged out in the map. refined orientations of the prefusion s showed they rotate around their stalks almost freely outside the envelope, leaning at an average angle of 40°±20° relative to the normal axis of the envelope ( figures 1b and 1d ). the rotational freedom of spikes is allowed by its low population density, which is prominently distinct from other enveloped viruses possessing class-i fusion proteins (harris et al., 2006; li et al., 2016; liu et al., 2008) . interestingly, a minor population of y-shaped spikes pair having two heads and one combined stem were observed ( figures s2a and s2c ), which possibly represent spikes intertwined with their stems. these observations suggest that the sars-cov-2 spikes possess unusual freedom on the viral envelope. such unique features may facilitate the virus in exploring the surrounding environment and better engaging with the cellular receptor ace2, allowing multiple spikes to bind with one ace2 or one spike with multiple ace2 simultaneously. however, the sparsely packed spikes on the viral envelope are also more vulnerable to neutralizing antibodies that bind the otherwise less accessible domains (chi et al., 2020) or glycan holes (walls et al., 2019) . our observations on the structures and landscape of the intact sars-cov-2 are consistent with two other cryo-et studies appeared at the same period (turoňová, 2020; ke et al., 2020) . j o u r n a l p r e -p r o o f the native structures of s in the rbd down and one rbd up conformations were similar to the rigidly fitted recombinant protein structures (pdb: 6xr8 and 6vyb) (cai et al., 2020; walls et al., 2020) , except for the n terminus domain (ntd) . comparison between the rigidly and flexibly fitted pdb: 6xr8 suggested that the ntd on the native spike structure shifted 9 å (centroid distance) away from s2 ( figure s3e ). the slight dilation and lower local resolution ( figure s3a and s3b) of the ntd on the native spike against recombinant spike structures was also observed on the other cryo-et structures . it is known that ntd exhibits certain mobility as a rigid body (cai et al., 2020; walls et al., 2020; wrapp et al., 2020; xiong et al., 2020) . it is possible that through large date set and classification, the near-atomic resolution cryo-em reconstruction of the spike represents its metastable conformation, while the cryo-et reconstructions represent an average of various dynamic states of the ntd. ten n-linked glycans are visible on the rbd down and seven on the one rbd up conformations, of which n61, n282, n801, n1098 and n1134 were best resolved. interestingly, densities for glycans n1158 and n1173/n1194 are visible on the stem of the spike ( figures 2b and s4c ). in general, the glycan densities observed on the native spike fit well with the full-length recombinant structure (cai et al., 2020) , however they are bulkier than observed in the tm truncated recombinant spikes wrapp et al., 2020) . similar observations were reported by two studies of the native spike structures appeared at the same period (turoňová, 2020; ke et al., 2020) . we further determined the native glycan identity by analyzing the virus sample using massspectrometry (ms). the viral particles, with or without pngase f digestion, were resolved on sds-page. after pngase f treatment, the s1 and s2 subunits were reduced by ~30 kda and ~20 kda, respectively, in weight ( figure 2c ). the bands corresponding to s1 and s2 before pngase f treatment were analyzed by ms to reveal the glycan compositions at each of the 22 glycosylation sites ( figures 2d and s4b ). the overall processing states of the native glycans are highly similar to that of the recombinant glycoprotein glycans ( figures s4a and s4b ) (watanabe et al., 2020a) , a feature shared with mers and sars-cov (walls et al., 2019) . populations of under-processed oligomannose-type glycans are found at the same sites as seen in the recombinant material, including at n234 where the glycan is suggested to have a structural role (casalino et al., 2020) . however, as is observed at many sites on hiv (cao et al., 2018; struwe et al., 2018) , the virus exhibits somewhat lower levels j o u r n a l p r e -p r o o f of oligomannose-type glycosylation compared to the recombinant, soluble mimetic. overall, the presence of substantial population of complex-type glycosylation suggests that the budding route of sars-cov-2 into the lumen of endoplasmic reticulum-golgi intermediate compartments (ergic) is not an impediment to glycan maturation and is consistent with both analysis of sars-cov glycans (ritchie et al., 2010) and the identification of neutralizing antibodies targeting the fucose at the n343 glycan on sars-cov-2 (pinto et al., 2020) . furthermore, the lower levels of oligomannose-type glycans compared to hiv and lasv are also consistent with lower glycan density (watanabe et al., 2020b; watanabe et al., 2018) . comparing the structures between our native prefusion s to the recombinant ones, we conclude that 1) the n-linked glycans present on the native spike are bulkier and contain elevated levels of complex-type glycans; 2) the recapitulation of the main features of native viral glycosylation by soluble, trimeric recombinant s glycoprotein is encouraging for vaccine strategies utilizing recombinant s protein. apart from the triangular prefusion s, needle-like densities were occasionally observed on the viral envelope ( figure s2b (cai et al., 2020; fan et al., 2020) . in comparison to the prefusion conformation, the fixation in orientation to the envelope suggests a dramatic conformational reordering of the stem region to achieve the postfusion conformation. the postfusion s were found only on a small figure 3b ), compared to ~15 nm average distance between the nearest prefusion s. distinguished from the sars-cov virus, which was estimated to possess an average of ~50-100 spikes per virion (neuman et al., 2011) , the sars-cov-2 virus possesses approximately j o u r n a l p r e -p r o o f half of the prefusion s and occasionally some postfusion s. the postfusion s observed on the sars-cov-2 virus may come from 1) products of occasional, spontaneous dissociation of s1 (cai et al., 2020) , which was cleaved by host proteinases; 2) syncytium naturally formed on infected cells , when budding progeny virions carried a few residual postfusion s from the cell surface; 3) sample preparation procedure, as cryo-em images of ßpropiolactone fixed viruses showed most spikes present on the virus are postfusion-like gao et al., 2020) . such instability of the prefusion s was reported on the other βcoronaviruses (pallesen et al., 2017) . in addition, the distribution graph ( figure 3c ) implies that the kinetically trapped prefusion s is more fragile than the postfusion s, and could even dissociate from the virus. the speculation is based on the fact that intracellular virions possess more spikes on average (klein et al., 2020) than the extracellular virions reported by us and the others (turoňová, 2020; ke et al., 2020) , and the occasional observation of the spike-less "bald" virus in our data. in summary, we speculate that the sars-cov-2 prefusion s are unstable, indicating that the distribution of solvent exposed epitopes on the virions is more complicated than the observations on the recombinant proteins. our observation has implications for efficient vaccine design and neutralizing antibody development, which prefer a sufficient number of stable antigens. it remains enigmatic how coronaviruses pack the ~30 kb rna within the ~80 nm diameter viral lumen; if the rnps are ordered relative to each other to avoid rna entangling, knotting or even damaging; or if the rnps are involved in virus assembly. when raw tomogram slices were inspected, densely packed, bucket-like densities were discernible throughout the virus lumen, some of which appeared to be locally ordered ( figure s5a ). combining previous cryo-et observation on coronaviruses (barcena et al., 2009 ) and sds-page/ms analysis ( figure 2c ), the densities most likely belonged to the rnps. in total 18,500 rnps were picked in the viral lumen, and initially aligned using a sphere as the template and a large spherical mask. a bucket-like conformation with little structural feature emerged adjacent to the density for lipid bilayer ( figure s5d ), suggesting that a significant number of rnps were membrane proximal. alignment using a small spherical mask revealed a 13.1 å resolution reverse g-shaped architecture of the rnp, measuring 15 j o u r n a l p r e -p r o o f nm in diameter and 16 nm in height ( figure 4b ). its shape is in good comparison to a recently reported sars-cov-2 rnp conformation (klein et al., 2020) , as well as the in situ conformation of the chikungunya viral rnp, which is also positive stranded (jin et al., 2018) , but different from the mhv rnp released using detergent (gui et al., 2017) . the map was segmented into five head-to-tail reverse l-shaped densities, each fitted with a pair of n proteins (n terminus domain (n_ntd): 6wkp, c terminus domain (n_ctd): 6wji) dimerized by the n_ctd (chen et al., 2007) (figure s6a and s6c). we analyzed the electrostatic potential distribution on the surface of the decamer, and suggested a tentative structural model of rna winded rnp ( figures s6b and s6d) . interestingly, an early observation on the mhv rnps showed ~15 nm diameter helices with five subunits per turn (caul and eggleston, 1979) . due to the limited resolution and little previous structural knowledge about the (+) rna virus rnps, our model shall be interpreted with caution. further 2d classification of the rnps revealed three classes: 1) closely packed against the envelope, 2) hexagonally and 3) triangularly packed rnps ( figure 4a ). following 3d refinement, a membrane proximal, "eggs-in-a-nest" shaped rnp assembly (referred to as the "hexon"), and a membrane-free, "pyramid" shaped rnp assembly (referred to as the "tetrahedron") emerged . projection of the two class averages back onto their refined coordinates revealed that the majority of hexons came from spherical virions, while more tetrahedrons from ellipsoidal virions ( figure 4d ). this was quantified by statistics: ellipsoidal virions tend to pack more rnp tetrahedrons ( figure 4e ). furthermore, the spacing between two neighboring rnps (~18 nm) is the same for both the tetrahedrons and hexons, and some tetrahedrons could assemble into hexons when projected onto their in situ coordinates ( figure s5b ), suggesting that the rnp triangle is a key and basic packing unit throughout the virus. we further propose that the rnps are involved in coronavirus assembly and help strengthen the virus against environmental and physical challenges, as purified virions remained intact after five cycles of freeze-and-thaw treatment ( figure s7 ). such involvement of rnps in viral assembly was also reported by (neuman et al., 2006) , who showed rnps form a lattice underneath the envelope; as well as seen in intracellular virions (steffen klein et al., 2020) . however, it remains unanswered if the ultrastructures of rnps are assembled by rna, the transmembrane m or e proteins, the rnp itself, or multiples of the above. solving rnps to subnanometer resolution was hindered by the crowding of rnps against each other ( figures s5e-s5g) . furthermore, structural features of the rnps on higher order assemblies smeared, possibly due to the symmetry mis-match between individual rnps and the assembly (figures 4c and 4d) . no virus with strictly ordered rnps throughout the lumen was found by projections. we conclude that the native rnps are highly heterogeneous, densely packed yet locally ordered in the virus, possibly interacting with the rna in a beadson-a-string stoichiometry. wrote the manuscript. all authors critically revised the manuscript. the authors declare no competing interests. (2) purified zju_5; (3) zju_5 treated with pngase f and (4) pngase f as control. after pngase f treatment, the molecular weight of the s1 subunit is reduced by ~30 kda, and s2 by ~20 kda in weight. illustrated by projecting the refined structures onto their coordinates and overlaying with the raw tomogram (lowpassed to 80 å resolution). further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, sai li (sai@tsinghua.edu.cn). this study did not generate new unique reagents. the purification, concentration, biochemical analysis and sample preparation for electron microscopy of inactivated virions were carried out in a bsl-2 lab. for cryo-et, the fixed virions were pelleted through 30% sucrose cushion by ultracentrifugation (beckman, in) at 100,000 g for 3 hours at 4 °c and resuspended in 60 µl hepes-saline buffer containing 10 mm hepes, ph 7.3, 150 mm nacl at 4 °c overnight (neuman et al., 2008) . to deglycosylate s protein of virions, 20 µg purified virion sample was treated with 500 units glycopeptide fragmentation data were extracted from the raw file using byonic™ (version 2.8.2). the ms data was searched using the protein metrics 309 n-glycan library. the search criteria were as follows: non-specificity; carbamidomethylation (c) was set as the fixed modifications; the oxidation (m) was set as the variable modification; precursor ion mass tolerances were set at 20 ppm for all ms acquired in an orbitrap mass analyzer; and the fragment ion mass tolerances were set at 0.02 da for all ms2 spectra acquired. the intensities of each glycan type in identical site were combined and analyzed for proportion. data with score less than 30 were discarded. the glycans were classfied into oligomannose, hybrid and complex type based on composition. hybrid and complex type glycan were subdivided according to fucose component and antenna. 7 µl virus sample was applied onto a glow discharged copper grid coated with holey carbon (r 2/2; quantifoil, jena, germany), and subsequently dipped onto 500 µl hepes-saline buffer for 1 second to clear the sucrose. a drop of 3 µl gold fiducial beads (10 nm diameter; aurion, the netherlands) was applied and the grid was blotted for 4.5 s, vitrified by plungefreezing into liquid ethane using a cryo-plunger 3 (gatan, ca). fixed cells cultured on grids were applied with 2 µl fiducial beads (10 nm diameter; aurion, the netherlands) prior to single-sided plunge-frozen. the grids were imaged on a titan krios microscope (thermo fisher scientific, hillsboro, or) operated at a voltage of 300 kv equipped with an energy filter (slit width 20 ev; gif quantum ls, gatan, ca) and k3 direct electron detector (gatan, ca). virions were recorded in super-resolution mode at a nominal magnification of 64,000×, resulting in a calibrated pixel size of 0.68 å. 361 sets of tilt-series data were collected using the dose-symmetric scheme (hagen et al., 2017) from -60° to 60° at 3° steps and at various defocus between -1.7 and -5 µm in serialem (mastronarde, 2005) . at each tilt, a movie consisting of 8 frames was recorded with 0.0265 s/frame exposure, giving a total dose of 131.2 e -/å 2 per tilt series. for the freeze-and-thaw test, 1.5 µl purified virions were diluted in 6 µl hepes-saline buffer at 4 °c, then subjected to five cycles of freezing in liquid nitrogen and thawing in water bath at 37 °c. for the negative staining microscopy, 4 µl freeze-and-thaw sample was applied on copper grids (zhongjingkeyi technology, beijing, china), stained using 2% uranium acetate and imaged using a tecnai spirit tem (thermo fisher scientific, hillsboro, or). tilt series data was analysed in a high-throughput pre-processing suite developed in our lab. the electron beam induced motion was corrected using a combination of motioncor (li et al., 2013) and motioncor2 (zheng et al., 2017) by averaging eight frames for each tilt. defocuses of the tilt series were measured using gctf (zhang, 2016) . the tilt series were contrast transfer function corrected using novactf (turonova et al., 2017) , 319 tilt-series with good fiducial alignment and relative thin ice thickness were reconstructed to tomograms by weighted back projection in imod (kremer et al., 1996) , resulting in a final pixel size of subtomogram averaging was done using dynamo (castano-diez et al., 2012) . for the prefusion s reconstruction, 54,878 subtomograms were extracted from 4 × binned tomograms into boxes of 96×96×96 voxels and emd-21452 was used as the template for their alignment. the resolution was restricted to 40 å and c3 symmetry was applied at this stage. 8,562 spikes present at the edges of the tomograms were removed to minimize the impact of air-water interface effect and incomplete signal on the structure. the remaining particles were subjected to multi-reference alignment imposing c1 symmetry using emd-21452 and emd-21457 lowpassed to 30 å resolution as the templates, resulting in 25,236 spikes (54.5%) classified into rbd down conformation and 21,080 spikes (45.5%) into one rbd up conformation. coordinates of the two spike conformations were used to extract boxes of 160×160×160 voxels from the 2 × binned tomograms for further alignment. to prevent overfitting, a customized 'gold-standard adaptive bandpass filter' method was used for the alignment at this stage, and a criterion of 0.143 for the fourier shell correlation were used to estimate the resolution. the 2 × binned spikes in the rbd down and one rbd up conformations were independently further aligned imposing c3 or c1 symmetry respectively, to 9.5 and 10.9 å resolution. finally, the rbd down spike subtomograms were extracted from unbined tomograms into boxes of 256×256×256 voxels and aligned to 8.7 å resolution. the prefusion s maps were lowpassed according to the estimated local resolutions of the reconstructed subunits. universal empirical b-factors of -1200 and -2000 were applied to sharpen the rbd down and one rbd up spikes, respectively . for the postfusion s reconstruction, 2,010 subtomograms were extracted from 4 × binned tomograms into boxes of 96×96×96 voxels, which were averaged to give an initial template for their alignment. the resolution was restricted to 30 å and c3 symmetry was applied at this stage. next, the refined coordinates were used to extract 1,954 postfusion s from the 2 × binned tomograms into boxes of 160×160×160 voxels for gold-standard alignment. subsequent alignment achieved 15.3 å resolution. for the rnp reconstruction, 18,500 manually picked rnps were extracted into subtomograms of 80×80×80 voxels from 4 × binned tomograms and globally aligned using a large sphere (radius 36 pixels) as the template. the resolution was restricted to 40 å and no symmetry was applied at this stage. lipid bilayers were visible in the aligned maps, suggesting part of the rnps are relatively packed with the membrane. the alignment was repeated using a small spherical mask (radius 18 pixels). a reverse "g"-shaped structure appeared after this stage and the refined coordinates were used to extract particles from the 2 × binned tomograms into boxes of 128×128×128 voxels. gold-standard was applied to align the rnp to a final resolution at 13.1 å. to analyze the local pattern of the rnp assembly, the picked rnps' coordinates were imported into the relion subtomogram averaging pipeline (bharat and scheres, 2016) . the rnp particles were extracted and projected into 2d images. three characteristic patterns of the 2d classification are selected and subjected to 3d initial model generation and 3d classification: 1) closely packed towards the envelope, 2) hexagonally packed and 3) triangularly packed rnps. following 3d refinement, the first class converged only on the membrane. the second class aligned into a hexagonally packed, membrane proximal rnp assembly, and the third class aligned into a tetrahedrally packed, membrane-free assembly. refined coordinates and orientations of the hexagonal particles ( three representative sars-cov-2 virus (figures 1b and 4d ) and a bundle of postfusion s ( figure 3b ) were reconstructed by projecting all spikes and rnps onto their refined coordinates and merging the structures using jsubtomo (huiskonen et al., 2014) . for other map-projection to coordinates ( figure 1d , s2c and s5b), the 'dtplot' function in dynamo was used. ucsf chimera (pettersen et al., 2004) and chimerax (goddard et al., 2018) were used for rendering the graphics. atomic models (pdb accession code 6xr8, 6vyb, 6xra) of the pre-and postfusion s were rigidly fitted to the corresponding densities using the fit in map tool (pettersen et al., 2004) . the rnp map was segmented into five reverse l-shaped units, which can be further ungrouped into 7 segments above and 10 segments on the base. according to the previous small-angle x-ray scattering (saxs) (chang et al., 2009 ) and cryo-em (gui et al., 2017) reports, the n_ntd and n_ctd possibly form a reverse l-shaped unit; the n_ctd dimer was suggested to be an assembly unit of the rnp (chen et al., 2007) . one segment above and two on the base forming a reverse "l" from the best solved region were selected, and were fitted with a n_ctd dimer (6wji) using the 'fit to segments' tool (pintilie et al., 2010) in ucsf chimera. the segment with the best fitting score (0.78 against 0.74 and 0.72) were adopted as the n_ctd. the other two segments were fitted with the n_ntd monomer (6wkp, score 0.92 and 0.90). with the reverse "l"-shaped n_ntd-ctd pair formed, we fitted the rest of rnp with four such units, leaving two upper segments unoccupied. together, the map was interpreted as a decamer of n. molecular dynamic flexible fitting (mdff) (trabuco et al., 2009 ) was applied to improve the fitting of the atomic model to the s in rbd down conformation. pdb: 6xr8 was prepared in vmd (humphrey et al., 1996) for the mdff, which was performed in vacuum until convergence using namd 2.12 (phillips et al., 2005) statistics was performed using python package scipy. the aligned location and orientation were used for the statistics of the spike tilt angles. for a given spike on a given virus, the nearest envelope mesh to the spike stem end was found. the angle between vector a (the spike's z axis) and vector b (normal to the envelope mesh) was calculated. for the statistics in fig s1c, envelopes of 113 virions before ultracentrifugation and 157 after ultracentrifugation were fitted by ellipse to estimate their diameters. 382 virions with more than 5 tetrahedron/hexon rnp assemblies were included for the statistics shown in figure 4e . they were sorted into three classes according to the ratio of the longest to the shortest axis: spherical (long/short axis: j o u r n a l p r e -p r o o f cryo-electron tomography of mouse hepatitis virus: insights into the structure of the coronavirion in situ structural analysis of sars-cov-2 spike reveals flexibility mediated by three hinges activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites resolving macromolecular structures from electron cryotomography data using subtomogram averaging in relion distinct conformational states of sars-cov-2 spike protein. science differential processing of hiv envelope glycans on the virus and soluble recombinant trimer dynamo: a flexible, user-friendly development tool for subtomogram averaging of cryo-em data in high-performance computing environments dynamo catalogue: geometrical tools and data management for particle picking in subtomogram averaging of cryo-electron tomograms coronavirus-like particles present in simian faeces multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: implications for ribonucleocapsid protein packaging structure of the sars coronavirus nucleocapsid protein rna-binding dimerization domain suggests a mechanism for helical packaging of viral rna a neutralizing human antibody binds to the n-terminal domain of the spike protein of sars-cov-2 viral architecture of sars-cov-2 with post-fusion spike revealed by cryo-em analysis of the post-fusion structure of the sars-cov spike glycoprotein development of an inactivated vaccine candidate for sars-cov-2 meeting modern challenges in visualization and analysis electron microscopy studies of the coronavirus ribonucleoprotein complex implementation of a cryo-electron tomography tilt-scheme optimized for high resolution subtomogram averaging patientderived mutations impact pathogenicity of sars-cov-2. medrxiv influenza virus pleiomorphy characterized by cryoelectron tomography controlling the sars-cov-2 spike glycoprotein conformation sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor averaging of viral envelope glycoprotein spikes from electron cryotomography reconstructions using jsubtomo vmd: visual molecular dynamics neutralizing antibodies inhibit chikungunya virus budding at the plasma membrane the architecture of sars-cov-2 transcriptome stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis computer visualization of three-dimensional image data using imod structure of the sars-cov-2 spike receptor-binding domain bound to the ace2 receptor acidic ph-induced conformations and lamp1 binding of the lassa virus glycoprotein spike electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-em molecular architecture of native hiv-1 gp120 trimers shielding and beyond: the roles of glycans in sars-cov-2 spike protein genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding all-atom empirical potential for molecular modeling and dynamics studies of proteins automated electron microscope tomography using robust prediction of specimen movements purification and electron cryomicroscopy of coronavirus particles supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy a structural analysis of m protein in coronavirus assembly and morphology immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen ucsf chimera -a visualization system for exploratory research and analysis scalable molecular dynamics with namd quantitative analysis of cryo-em density map segmentation by watershed and scale-space filtering, and fitting of structures by alignment to regions structural and functional analysis of a potent sarbecovirus neutralizing antibody common features of enveloped viruses and implications for immunogen design for next-generation vaccines identification of n-linked carbohydrates from severe acute respiratory syndrome (sars) spike glycoprotein structural basis of receptor recognition by sars-cov-2 characterization of severe acute respiratory syndrome-associated coronavirus (sars-cov) spike glycoproteinmediated viral entry proteolytic activation of the sarscoronavirus spike protein: cutting enzymes at the cutting edge of antiviral research cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace2 sars-cov-2 structure and replication characterized by in situ cryo-electron tomography site-specific glycosylation of virion-derived hiv-1 env is mimicked by a soluble trimeric immunogen molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and x-ray crystallography efficient 3d-ctf correction for cryoelectron tomography using novactf improves subtomogram averaging resolution to 3.4a structure, function, and antigenicity of the sars-cov-2 spike glycoprotein unexpected receptor functional mimicry elucidates activation of coronavirus fusion structure and assembly of the ebola virus nucleocapsid structural and functional basis of sars-cov-2 entry by using human ace2 site-specific glycan analysis of the sars-cov-2 spike vulnerabilities in coronavirus glycan shields despite extensive glycosylation structure of the lassa virus glycan shield provides a model for immunological resistance cryo-em structure of the 2019-ncov spike in the prefusion conformation a new coronavirus associated with human respiratory disease in china inhibition of sars-cov-2 (previously 2019-ncov) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion a thermostable, closed sars-cov-2 spike protein trimer structural basis for the recognition of sars-cov-2 by full-length human ace2 gctf: real-time ctf determination and correction motioncor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy a pneumonia outbreak associated with a new coronavirus of probable bat origin structures and distributions of sars-cov-2 spike proteins on intact virions key: cord-328289-3h3kmjlz authors: iadecola, costantino; anrather, josef; kamel, hooman title: effects of covid-19 on the nervous system date: 2020-08-19 journal: cell doi: 10.1016/j.cell.2020.08.028 sha: doc_id: 328289 cord_uid: 3h3kmjlz summary neurological complications have emerged as a significant cause of morbidity and mortality in the ongoing covid-19 pandemic. beside respiratory insufficiency, many hospitalized patients exhibit neurological manifestations, ranging from headache and loss of smell, to confusion and disabling strokes. covid-19 is also anticipated to take a toll on the nervous system in the long term. here we will provide a critical appraisal of the potential for neurotropism and mechanisms of neuropathogenesis of sars-cov-2, as they relate to the acute and chronic neurological consequences of the infection. finally, we will examine potential avenues for future research and therapeutic development. neurological complications have emerged as a significant cause of morbidity and mortality in the ongoing covid-19 pandemic. beside respiratory insufficiency, many hospitalized patients exhibit neurological manifestations, ranging from headache and loss of smell, to confusion and disabling strokes. covid-19 is also anticipated to take a toll on the nervous system in the long term. here we will provide a critical appraisal of the potential for neurotropism and mechanisms of neuropathogenesis of sars-cov-2, as they relate to the acute and chronic neurological consequences of the infection. finally, we will examine potential avenues for future research and therapeutic development. there is increasing evidence that the nervous system is frequently involved in patients hospitalized with coronavirus disease 2019 . this is not surprising since neurological manifestations have long been described also in infections from other respiratory viruses, including coronaviruses (bergmann et al., 2006) . however, the neurological manifestations of covid-19 are common and disabling enough to have attracted widespread attention in the scientific and lay press for their short-and long-term impact on population health (pleasure et al., 2020; wenner moyer, 2020) . a large body of clinical data from tertiary referral centers is rapidly accumulating on this topic worldwide, often with conflicting observations, partly reflecting the preliminary and incomplete nature of the available data. here, we provide a succinct summary of the nervous system involvement in covid-19. in particular, we will focus on the mechanisms of pathogenicity, on the acute and delayed neurological manifestations reported to date, and on how the nervous system involvement compares to that of other respiratory viruses. finally, we will attempt to flesh out caveats and unanswered questions that may help gain a better appreciation of this critical aspect of covid-19 and chart a path forward to minimize its harmful nervous system involvement. an ace2-dependent manner (song et al., 2020) . in brain cells derived from human pluripotent stem cells, dopaminergic neurons, but not cortical neurons or microglia, were particularly susceptible to sars-cov-2 infection . clinical-pathological studies that have tested for the presence of the virus in the brain or the cerebrospinal fluid (csf) have had mixed results. some studies have shown sars-cov-2 rna in brain post-mortem or in the csf in patients with encephalopathy or encephalitis, but at very low levels (moriguchi et al., 2020; solomon et al., 2020) . other studies could not detect viral invasion, even though there was evidence of csf inflammation (bernard-valnet et al., 2020; ye et al., 2020) . considering the inconsistent data and the low levels of viral rna, when detected, the possibility of artifact or contamination has been raised (solomon et al., 2020) . potential routes of brain entry: examination of how the virus could enter the nervous system may help assess the likelihood for direct invasion and pathogenicity. based on other coronaviruses, several potential routes of entry for sars-cov-2 have been proposed (bergmann et al., 2006) . olfactory route: infection of olfactory system is consistent with the observation that loss of smell is a frequent neurological manifestation in covid-19 (see neurological manifestations of and with evidence of increased mri signal in the olfactory cortex suggestive of infection (politi et al., 2020) . the virus could be internalized in nerve terminals by endocytosis, transported retrogradely, and spread trans-synaptically to other brain regions, as described for other coronaviruses (dubé et al., 2018) . ace2 and tmprss2 have been detected in the nasal mucosa at the rna and protein levels, but they seem to be localized to epithelial cells (sustentacular cells), not olfactory neurons (brann et al., 2020) , although another report suggests neuronal involvement (nampoothiri et al., 2020) . therefore, it is unclear if the virus is restricted to the olfactory epithelium or reaches olfactory neurons. blood-brain barrier: the bbb is a common route of entry of blood-borne viruses into the brain (bergmann et al., 2006) . in covid-19, dissemination of the virus into the blood has been described, albeit with widely ranging frequencies (1% to 41%) (wang et al., 2020c; zheng et al., 2020) , and the virus could access the brain by crossing the bbb. crossing the intact bbb would require internalization and transport of the virus across the cerebral endothelium, in which the expression of sars-cov-2 docking proteins remains unclear ( figure 1 ). ace2 immunoreactivity was observed in brain vessels of a patient who died with multiple ischemic infarcts but the cellular localization was not determined (bryce et al., 2020) . the possibility of entry through other putative sars-cov-2 receptors expressed more widely in the cerebral vasculature, such as nrp1 and bsg, cannot be ruled out (cantuti-castelvetri et al., 2020) . on the other hand, sars-cov-2-associated cytokines, including il-6, il-1β, tnf, and il-17 disrupt the bbb (erickson and banks, 2018) and could facilitate the entry of the virus ( figure 2 ). sars-cov-2 has been postulated to induce endothelial infection and inflammation in peripheral vessels (teuwen et al., 2020) , but direct evidence in cerebral endothelial cells has not been thus far provided. rather, a lack of florid cerebrovascular inflammation has been noted in several autopsy studies (bryce et al., 2020; kantonen et al., 2020; reichard et al., 2020; solomon et al., 2020) . comorbidities often seen in covid-19, including cardiovascular risk factor or pre-existing neurological diseases, could, alone or in combination with cytokines, increase bbb permeability (erickson and banks, 2018) . for example, in a covid-19 patient with parkinson's disease, electron microscopy revealed viral particles in frontal lobe microvessels and neurons, suggesting trans-endothelial entry (paniz-mondolfi et al., 2020) . another parkinson's disease patient with obesity, hypertension and diabetes, exhibited at autopsy, in addition to hypoxic-ischemic neuronal damage, microhemorrhages, white matter lesions and enlarged perivascular spaces, but no evidence of sars-cov-2 in the brain (kantonen et al., 2020) . sars-cov-2 could also enter the brain through the median eminence of the hypothalamus and other circumventricular organs, brain regions with a leaky bbb due to openings (fenestrae) in the capillary wall (kaur and ling, 2017) . although the size of the viral particle (80-120nm) is larger than endothelial fenestrae (sarin, 2010) , preliminary data suggest j o u r n a l p r e -p r o o f that median eminence capillaries and tanycytes express ace2 and tmprss, which could allow virus entry into the hypothalamus (nampoothiri et al., 2020) . owing to its widespread connection, the hypothalamus could serve as a gateway to the entire brain. infiltration of infected immune cells: viruses can enter the brain carried by infected immune cells, which can also serve as reservoir (bergmann et al., 2006) . monocytes, neutrophils and tcells traffic into the brain through the vasculature, the meninges and the choroid plexus (engelhardt et al., 2017) , and these sites could be entry points for infected immune cells. conclusive evidence of infection of immune cells by sars-cov-2 has not been provided thus far (merad and martin, 2020) . sars-cov-2 envelope np protein immunoreactivity was observed in cd68+ cells in lymphoid organs (chen et al., 2020a) , while single-cell rna seq data showed viral rna in macrophages in bronchoalveolar lavage of covid-19 patients (bost et al., 2020) . but it remains unclear if this is due to actual virus propagation in macrophages or to phagocytic uptake of virus infected cells or extracellular virions (bost et al., 2020; merad and martin, 2020) . furthermore, several autopsy series have revealed a notable lack of immune cell infiltration (kantonen et al., 2020; reichard et al., 2020; solomon et al., 2020) . in summary, sars-cov-2 can infect neurons in vitro and cause neuronal death, but data from csf and autopsy studies do not provide consistent evidence of direct cns invasion. however, effects on the median eminence and other circumventricular organs, cannot be ruled out and may play a role in the systemic manifestations of the disease. lung damage and respiratory failure: the lung is the organ most affected in covid-19, with massive alveolar damage, edema, inflammatory cell infiltration, microvascular thrombosis, microvascular damage and hemorrhage (carsana et al., 2020) . sars-cov-2 has been detected mainly in pneumocytes and epithelial progenitors (bost et al., 2020; carsana et al., 2020) . the respiratory failure resulting from lung damage leads to severe hypoxia (adult respiratory distress syndrome, ards), requiring assisted ventilation (grasselli et al., 2020) . consistent with hypoxic brain injury, autopsy studies in covid-19 have shown neuronal damage in brain regions most vulnerable to hypoxia, including neocortex, hippocampus and cerebellum (kantonen et al., 2020; reichard et al., 2020; solomon et al., 2020) . a key feature of covid-19 is a maladaptive immune response characterized by hyperactivity of innate immunity followed by immunosuppression (diao et al., 2020; qin et al., 2020; vabret et al., 2020; zhou et al., 2020) . improvement of t-cell function coincides with remission of symptoms and declining viral loads (thevarajan et al., 2020) , attesting to the link between immuno-suppression and disease severity. in patients with severe disease, the cytokine release syndrome can develop (qin et al., 2020; xu et al., 2020) . most covid-19 patients exhibit increased circulating levels of il-6 il-1β, and tnf, as well as il-2, il-8, il-17, g-csf, gm-csf, ip10, mcp1, and mip1α2, and serum levels of il-6 and tnf reflect disease severity (diao et al., 2020) . even in the absence of sars-cov-2 brain invasion, viral proteins shed in the circulation and molecular complexes from damaged cells, such as the nuclear protein high mobility group box 1 (hmgb1) (chen et al., 2020b), could enter the brain through a compromised bbb ( figure 2 ). after brain entry, these molecules could act as pathogen associated molecular patterns (pamps) and damageassociated molecular patters (damps), and induce an innate immune response in pericytes, brain-resident macrophages and microglia, which express toll-like receptors (tlr) (figure 2 ). tlr2 mediates the pro-inflammatory effects of sars-cov spike protein on human macrophages through nf-κb (dosch et al., 2009) . such innate immune response increases cytokine production and impair brain function (dantzer, 2018) . in mice, viral infections increase circulating levels of ifnα/β leading to activation ifnr1 on cerebral endothelial cells and cxcl10-cxcr3-mediated cognitive impairment (cytokine sickness behavior) (dantzer, 2018 ). an ifn type i response does occur in covid-19 and is thought to be protective (merad and martin, 2020) , but could contribute to the alterations in consciousness (see neurological manifestation of the hypothalamus: target and culprit of immune dysregulation: the brain, the hypothalamus in particular, could also contribute to the immune dysregulation ( figure 3 ). several cytokines upregulated in covid-19 (il-6, il-1β, tnf) are powerful activators of the hypothalamic-pituitary-adrenocortical (hpa) axis (dantzer, 2018) . the hpa axis is central to the regulation of systemic immune activity and is activated by bbb dysfunction and neurovascular inflammation (dantzer, 2018). as mentioned above, covid-19 is associated with immunosuppression and lymphopenia. in stroke and brain trauma adrenergic stress involving βadrenergic receptors results in massive systemic immunosuppression . the mechanisms of these effects involve activation of the hpa, leading to the release of norepinephrine and glucocorticoids. these mediators act synergistically to induce splenic atrophy, t cell apoptosis, and nk cell deficiency. in the bone marrow, tyrosine hydroxylase and norepinephrine trigger a response in mesenchymal stromal cells, most likely through β3adrenergic receptors, resulting in a reduction of cell retention . downregulation of these factors, in concert with calprotectin release from damaged lungs, may increases hematopoietic stem cell proliferation skewed towards the myeloid lineage (emergency myelopoiesis) (schulte-schrepping et al., 2020; silvin et al., 2020) , which results in lymphopenia and neutrophilia, two key hematological features of covid-19 (chen et al., 2020a; moriguchi et al., 2020; qin et al., 2020) (figure 3 ). importantly, in sars, hpa activation and glucocorticoid levels are correlated with neutrophilia and lymphopenia (panesar et al., 2004) . hypercoagulable state: another key feature of covid-19 is a profound coagulopathy responsible for some of the most frequent and harmful complications of the disease. in a multicenter study, 88% of patients exhibited evidence of a hypercoagulable state (helms et al., 2020b) . covid-19 coagulopathy is characterized by a distinctive pro-coagulant state with increased cloth strength, increased d-dimers (fibrin breakdown products indicative of intravascular thrombosis), and increased fibrinogen, without significant changes in the number of platelets or prolongation of clotting time parameters (helms et al., 2020b) . coagulopathy and thrombosis may start in the lungs and other infected organs with endothelial damage, complement activation, the procoagulant action of il-6, and neutrophil recruitment (goshua et al., 2020; ramlall et al., 2020) . in turn, neutrophils release extracellular traps (nets) in covid-19 (middleton et al., 2020) , a lattice of chromatin and histones that activates clotting, which contributes to intravascular thrombosis by trapping cells and platelets in many organs including the brain. systemic organ failure: covid-19 also damages other organs. metabolic and pathological evidence of damage to the kidney, heart, liver, gastrointestinal tract, and endocrine organs has been provided inciardi et al., 2020; pal and banerjee, 2020; pan et al., 2020; su et al., 2020) . the resulting systemic metabolic changes, including water and electrolyte imbalance, hormonal dysfunction, and accumulation of toxic metabolites, could also contribute to some of the more non-specific nervous system manifestations of the disease, like confusion, agitation, headache etc. cardiac involvement could impact the brain by reducing cerebral perfusion or, as discussed in the next section, could be an embolic source leading to ischemic strokes. numerous neurological abnormalities have been described in patients with covid-19. these involve the central and peripheral nervous system, range from mild to fatal, and can occur in patients with severe or otherwise asymptomatic sars-cov-2 infection. neurological abnormalities have been described in approximately 30% of patients who required hospitalization for covid-19, 45% of those with severe respiratory illness, and 85% of those with ards (helms et al., 2020a; mao et al., 2020) . in patients with mild covid-19 neurological symptoms are mostly confined to nonspecific abnormalities such as malaise, dizziness, headache, and loss of smell and taste (mao et al., 2020) , routinely observed in respiratory virus infections such as the influenza (chow et al., 2019) . while serious neurological complications have been reported in patients with otherwise mild covid-19 (oxley et al., 2020) , the most severe complications occur in critically ill patients and are associated with significantly higher mortality yaghi et al., 2020) . encephalopathy and encephalitis: alterations in mental status (confusion, disorientation, agitation, somnolence), collectively defined as encephalopathy, have been consistently reported in various cohorts with covid-19. altered mental status occurs rarely (<5%) even in covid-19 patients requiring hospitalization for respiratory illness (mao et al., 2020) , but affects the majority of critically ill covid-19 patients with ards (helms et al., 2020a) . a key question is whether this alteration in mental status represents an encephalopathy caused by systemic illness or an encephalitis directly caused by the sars-cov-2 virus itself. several cases have been reported of covid-19 patients (efe et al., 2020; farhadian et al., 2020; huang et al., 2020b; moriguchi et al., 2020; pilotto et al., 2020) who appear to meet established diagnostic criteria for infectious encephalitis, which include altered mental status, fever, seizures, white blood cells in the csf, and focal brain abnormalities on neuroimaging (venkatesan et al., 2013) . in at least two reported cases, sars-cov-2 was detected in the csf (huang et al., 2020b; moriguchi et al., 2020) although, as discussed in the previous section (nervous system invasion), only modest amounts of viral rna were detected. in at least one covid-19 case, the diagnosis of temporal lobe encephalitis was confirmed by biopsy which showed perivascular lymphocytic infiltrates and hypoxic neuronal damage (efe et al., 2020) , but the presence of sars-cov2 or other viruses in brain or csf was not documented. indeed, most samples of csf in patients with neurological abnormalities in the setting of covid-19 have not revealed evidence of sars-cov-2 (kandemirli et al., 2020) and most samples of brain tissue from autopsies of covid-19 patients have not revealed evidence of encephalitis (see nervous system invasion). besides encephalitis, most covid-19 patients have other reasons for their altered mental status. delirium, confusional states, and coma appear most common in covid-19-related critical illness (helms et al., 2020a; mao et al., 2020; rogers et al., 2020) , which is often marked by hypoxia, hypotension, renal failure, the need for heavy doses of sedatives, and prolonged immobility and isolation (cummings et al., 2020)-all factors well known to cause encephalopathy (maas, 2020) . the rarity of cases clinically consistent with encephalitis, the paucity of histopathological evidence of encephalitis, and the many alternative explanations for the altered mental status suggest that sars-cov-2 brain invasion is a possible but rare cause of encephalopathy. ischemic stroke: stroke is not uncommon among patients hospitalized with covid-19, with reported rates ranging from 1-3% in hospitalized patients and up to 6% of critically ill patients (mao et al., 2020; merkler et al., 2020; yaghi et al., 2020) , 7-fold higher than in patients hospitalized with influenza even after adjustment for illness severity . early case reports described unusual embolic strokes in otherwise young healthy individuals with covid-19 (oxley et al., 2020) , but in subsequent case series patients were generally older and had numerous vascular comorbidities (lodigiani et al., 2020) . therefore, it remains unclear whether these strokes were caused by sars-cov-2 or represented the background incidence of stroke in this high-risk populations that also happened to be infected at the time. it is plausible that sars-cov-2 infection does play some role in causing stroke, given that infections in general increases stroke risk . the covid-19-related hypercoagulability would be expected to increase susceptibility to cerebrovascular events, as reported in an autopsy series in which widespread microthrombi and patches of infarction were observed some brains (bryce et al., 2020) . patients with covid-19 may be at risk of cardioembolic stroke. acute cardiac injury and clinically significant arrhythmias have been reported in approximately 10% of hospitalized covid-19 patients and 20-40% of those requiring intensive care huang et al., 2020a; wang et al., 2020a) . sars-cov-2 infection may rarely cause myocarditis and heart failure even in the absence of significant pulmonary involvement (inciardi et al., 2020) . myocardial injury and arrhythmias, such as atrial fibrillation, in the setting of severe infection may result in cardiac embolism and brain infarction (inciardi et al., 2020) . a substantial proportion of critically ill patients with covid-19 may also develop secondary bacteremia in addition to the primary viral illness. in one case series, approximately 10% of patients requiring mechanical ventilation had bacteremia , which increases the risk of stroke by over 20 folds (dalager-pedersen et al., 2014) . septic emboli to the brain often result in bleeding and in a postmortem magnetic resonance imaging study10% of brains had evidence of hemorrhage (coolen et al., 2020) .taken together, these clinical findings suggest that sars-cov-2 may adversely affect the brain via multiple pathophysiological pathways that culminate in vascular brain injury. post-infectious neurological complications: sars-cov-2 unleashes a dysregulated systemic immune response (see systemic inflammation and immune dysregulation), which can have delayed effects on the nervous system. these immune-mediated manifestations involve both the central and peripheral nervous system and occur typically after the acute phase of the infection subsides. in the cns, reported cases in covid-19 resemble classic post-infectious inflammatory conditions such as acute disseminated encephalomyelitis (parsons et al., 2020) and acute necrotizing hemorrhagic encephalopathy (poyiadji et al., 2020) . peripherally, several cases of guillain-barre syndrome, a neuropathy caused by an immune attack on peripheral nerves, have been reported in patients with recent covid-19 (toscano et al., 2020) . most reported cases describe classic features of this syndrome, such as generalized weakness, evidence of demyelination on nerve conduction studies and elevated proteins without white blood cells in csf (toscano et al., 2020) . the miller-fisher variant of guillain-barre syndrome, characterized by cranial nerve involvement, has also been reported, including at least one case with detectable anti-ganglioside antibodies suggesting an immune attack on the peripheral nerves (gutierrez-ortiz et al., 2020) . sars-cov-2 was not detected in any of the csf samples (toscano et al., 2020) , supporting an immune mechanism rather than direct infection. intensive care related neurological manifestations: the relatively high frequency of altered mental status in hospitalized covid-19 patients is congruent with the severity of their illness. most critically ill covid-19 patients require mechanical ventilation (cummings et al., 2020) and an agitated confusional state (delirium) occurs in more than 80% of mechanically ventilated patients in intensive care units (ely et al., 2001) . patients with ards, which frequently complicates severe covid-19, are at particularly high risk of delirium, likely because of hypoxemia heavy doses of sedatives, administration of paralytic agents or other causes (hopkins et al., 2005; ouimet et al., 2007) . comparison with other viral respiratory infections: many neurological abnormalities seen in covid-19 mirror those of other viral respiratory illnesses. all of the reported covid-19 related post-infectious inflammatory conditions of the nervous system, such as guillain-barre syndrome, acute necrotizing hemorrhagic encephalopathy and acute disseminated encephalomyelitis, are classically seen after infections, including other coronaviruses (gerges harb et al., 2020) . influenza is occasionally associated with an encephalopathy or full blown encephalitis, with evidence of influenza virus in the cerebrospinal fluid (surtees and desousa, 2006) . comparing the large numbers of patients infected by sars-cov-2 worldwide and the relative paucity of reported encephalitis cases, sars-cov-2 seems more similar to other common respiratory viral pathogens like influenza than to neurotropic pathogens that target specifically the brain, such as the herpes simplex virus. in general, however, covid-19 is more debilitating than other common viral respiratory illnesses. physicians have been struck by the frequency of thrombotic complications observed in critically ill covid-19 patients, to the point that some hospitals instituted protocols for empiric, high-dose anticoagulation in patients with elevated d-dimer levels (paranjpe et al., 2020) . emerging data seem to confirm this observation: in one multicenter study, patients with covid-19 and acute respiratory distress syndrome had twice the incidence of thrombotic complications compared to a matched cohort with ards from other causes (helms et al., 2020b) . this also applies to thrombotic complications affecting the brain, since the proportion of covid-19 related hospitalizations complicated by stroke seems much higher than that seen in influenza . based on neuroinflammation-associated abnormalities in the clotting cascade in brain (han et al., 2008) , activated protein c or thrombin inhibitors could also be of therapeutic value. the findings reviewed above indicate that neurological manifestations are common in covid-19 and constitute a defining aspect of the symptomatology of the disease. a caveat is that most clinical data is derived from case series on patients ill enough to require hospitalization at tertiary care centers, providing a biased representation of the frequency and type of the neurological manifestations. similarly, basic science investigations exploring the mechanism of disease have largely emphasized concepts and findings that emerged from other coronaviruses, and there is limited new data on the interaction of sars-cov-2 with the brain and its vasculature. therefore, conclusions based of existing literature have to be considered preliminary and subject to further scrutiny, verification and validation. here are some of the outstanding questions: do the neurological manifestations of covid-19 reflect brain invasion? the encephalopathy is most likely a consequence of systemic factors, such as cytokine sickness, hypoxia and metabolic dysfunction due to peripheral organ failure, while the strokes seem to be related more to hypercoagulability and endothelial injury than to sars-cov-2 vasculitis affecting brain vessels. the loss of taste and smell has been attributed to invasion of the olfactory neural system, but consistent evidence is lacking. in some cases, the possibility of a sars-cov-2 encephalitis could not be ruled out based on the potential for the virus to infect neurons (song et al., 2020) , but definitive clinical and pathological evidence of neurotropism is lacking. a major problem is that the molecular mechanisms of cellular entry for sars-cov-2 are not entirely clear. while ace2 is thought to be the main receptors in some cell types, its expression levels do not seem to correlate with the infectivity potential. for example, the virus gains access to human pluripotent stem cell-derived dopaminergic neurons despite low levels of ace2 . systematic investigation of non-canonical docking and accessory proteins for sars-cov-2 (figures 1, s1 ), their cellular localization and function in human neurons, glia and vascular cells would help address this question. does the brain contribute to the immune dysregulation? sars-cov-2 and inflammatory mediators may gain access to the median eminence and activate hypothalamic neurohumoral pathways that mediate immune dysregulation through the adrenergic system, as described in other brain diseases (figure 3 ). considering the importance of the immune dysregulation in covid-19 severity and outcome, a better understanding of the contribution of the hypothalamus may suggest pharmacological approaches to dampen the immune dysregulation . does the brain contribute to respiratory failure and hypertension? similarly, entry of the virus and/or proinflammatory molecules through the subfornical organ and the area postrema could also affect brainstem autonomic pathways controlling blood pressure and breathing (kaur and ling, 2017) . alterations in blood pressure, both hypertension and severe hypotension in critically patients , are highly prevalent in covid-19. furthermore, it has been suggested that involvement of brainstem respiratory nuclei may contribute to the respiratory failure , but no alterations in respiratory centers or chemoreceptors (carotid bodies) was observed at autopsy in a patient with respiratory dysregulation (kantonen et al., 2020) . to date, evidence of central autonomic involvement is lacking. what are the long-term neurological and neuropsychiatric consequences of covid-19? respiratory virus infections are associated with neurological and psychiatric sequelae, including parkinsonism, dementia, depression, post-traumatic stress disorder and anxiety (limphaibool et al., 2019; rogers et al., 2020) . brain infection is not required for these long-term effects. inflammation and cytokine elevation in sepsis survivors are linked to subsequent hippocampal atrophy and cognitive impairment (iwashyna et al., 2010) . experimental studies suggest a link between activation of the nlrp3 inflammasome, which may occur in covid-19, and alzheimer j o u r n a l p r e -p r o o f pathology (ising et al., 2019) . ards survivors also exhibit increased incidence of long-term depression, anxiety and cognitive impairment (hopkins et al., 2005) . whether these late manifestations are related to non-resolving inflammation or a low-grade immune process driven by molecular mimicry or dysregulated adaptive immunity remains to be established. chronic damage to systemic organs can also harm the brain through chronic hypoxia, metabolic dysfunction and hormonal dysregulation. based on these considerations, significant long-term neurological and psychiatric sequelae have to be anticipated in covid-19, especially in survivors of severe disease. experimental models: models would help address these outstanding questions and facilitate therapeutic development. unfortunately, mice, the most popular laboratory animals, are not susceptible to sars-cov-2 due to differences between mouse and human ace2 (lakdawala and menachery, 2020) . mice expressing human ace2 have been developed and show evidence of brain infection, but only minimal symptoms of disease (song et al., 2020) . hamsters, ferrets, cats and non-human primates could be more viable models (lakdawala and menachery, 2020) . reproducing the systemic effects of the disease would be critical for studying the neurological aspects of covid-19. in vitro approaches involving human pluripotent stem cells organoids and co-cultures are useful to examine infectious mechanisms in brain cells (song et al., 2020; yang et al., 2020) , but do not provide insight into the harmful systemic effects. therefore, there is a pressing need to develop animal models that are amenable to investigate not only the effects of sars-cov-2 on brain cells, but also the systemic effects of the infection and the long-term neuropsychiatric consequences. therapeutic considerations: until safe and effective vaccines are developed, therapeutic efforts have to focus on antiviral agents and on how to best manage respiratory insufficiency, organ failure, hypercoagulable state and immune dysregulation. there is no specific treatment for the neurological manifestations, which are managed according to standard protocols. however, since the neurological complications emerge mainly in severe systemic disease, minimizing hypoxia and protecting the brain from cytokines, damps, pamps and thromboembolic complications are important therapeutic goals. immunosuppression with steroids improves mortality in patients with severe disease, but not in those with milder forms (hayden, 2020) . furthermore, more nuanced approaches to counteract the immune dysregulation, such as targeting specific cytokines or inflammatory pathways are also being tested (vabret et al., 2020) . whether these interventions reduce the short-and long-term neurological and psychiatric complications remain to be established. in conclusion, the neurological manifestations of covid-19 constitute a major public health challenge not only for the acute effects on the brain, but also for the long-term harm to brain health that may ensue. these delayed manifestations are anticipated to be significant since they are likely to also affect patients who did not show neurological symptoms in the acute phase. therefore, clinical and laboratory efforts aiming to elucidate the mechanisms of the acute effects on the brain of sars-cov-2 need to be coupled with investigations on the deleterious delayed neuropsychiatric sequelae of the infection. these efforts should be driven by a close cooperation between clinical and basic scientists and take advantage of the wealth of clinicalepidemiological data and biological specimens that are accumulating worldwide. considering that covid-19 is still raging in many countries, including the us, and that there might be a seasonal resurgence of infection, it is imperative that a such a concerted effort is implemented swiftly and on a large scale. dr. iadecola serves on the scientific advisory board of broadview ventures. dr. kamel serves as co-pi for the nih-funded arcadia trial (ninds u01ns095869) which receives in-kind study drug from the bms-pfizer alliance for eliquis® and ancillary study support from roche diagnostics, serves as deputy editor for jama neurology, serves as a steering committee member of medtronic's stroke af trial (uncompensated), serves on an endpoint adjudication committee for a trial of empagliflozin for boehringer-ingelheim, and has served on an advisory board for roivant sciences related to factor xi inhibition. dr. anrather has no conflict of interests to declare. j o u r n a l p r e -p r o o f in a single nuclear rna-seq profile of human cortical brain tissue (https://celltypes.brain-map.org) (hodge et al., 2019) there was no evidence of ace2 expression in any brain cell type. basigin (bsg) was prominently expressed in pericytes and endothelial cells, while neuropilin-1 (nrp1) was detected in endothelial cells and in several classes of excitatory neurons. low expression of tmprss11a and furin was found in neurons, while cstb was moderately expressed in most cell types with the exception of astrocytes and oligodendrocytes and their precursors. endothelial cells and pericytes also express lymphocyte antigen 6 family member e (ly6e), and the interferon (ifn)-induced transmembrane proteins-1 and 3 (ifitm1, ifitm3), that have been shown to restrict sars-cov-2 cell entry (hachim et al., 2020; pfaender et al., 2020; zhao et al., 2020) . ifn type i receptors (ifnra1 and ifnra2) showed higher expression in endothelial cells than other cell types. cell cluster annotations are from (hodge et al., 2019) . cpm, transcript counts per million within the cell cluster; fraccellexpr, fraction of cells in which the transcript is detected. circulating virus, cytokines, damps and pamps could act on endothelial cells leading to inflammation and opening of the bbb. once in the perivascular space and these factors could induce inflammation in vascular mural cells and brain resident myeloid cells (microglia and macrophages). the resulting cytokine production could affect neuron neuronal function leading to the cytokine sickness, a potential cause of encephalopathy in covid-19. cytokine and sars-cov-2 entry into the median eminence of the hypothalamus could lead to activation of the autonomic nervous system and release of adrenal catecholamines and steroids. in analogy with stroke, brain trauma and myocardial infarction , these neurohumoral effectors could act on the bone marrow to release of immunosuppressor neutrophils and myeloid cells (emergency myelopoiesis), as described in covid-19 (schulte-schrepping et al., 2020) , leading to immunosuppression and lymphopenia. in addition, release of calprotectin and cytokines from damaged lungs could also contribute to emergency myelopoiesis (silvin et al., 2020) . j o u r n a l p r e -p r o o f etoc: iadecola et al., review and discuss the acute and chronic neurological consequences of covid-19, potential mechanisms for neuropathogenesis and the outstanding questions to minimize its harmful nervous system involvement. j o u r n a l p r e -p r o o f ace2 bsg nrp1 tmprss2 tmprss11a tmprss11b furin ctsb ctsl ly6e ifitm1 ifitm2 ifitm3 ifnar1 ifnar2 sars coronavirus spike protein-induced innate immune response occurs via activation of the nf-kappab pathway in human monocyte macrophages in vitro axonal transport enables neuron-to-neuron propagation of human coronavirus oc43 covid-19-associated encephalitis mimicking glial tumor delirium in mechanically ventilated patients: validity and reliability of the confusion assessment method for the intensive care unit (cam-icu) the movers and shapers in immune privilege of the cns neuroimmune axes of the blood-brain barriers and blood-brain interfaces: bases for physiological regulation, disease states, and pharmacological interventions acute encephalopathy with elevated csf inflammatory markers as the initial presentation of covid-19 sars, mers and covid-19: clinical manifestations and organsystem complications: a mini review endotheliopathy in covid-19-associated coagulopathy: evidence from a single-centre, cross-sectional study clinical characteristics of covid-19 in baseline characteristics and outcomes of 1591 patients infected with sars-cov-2 admitted to icus of the lombardy region miller fisher syndrome and polyneuritis cranialis in covid-19 interferon-induced transmembrane protein (ifitm3) is upregulated explicitly in sars-cov-2 infected lung epithelial cells tissue distribution of ace2 protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis proteomic analysis of active multiple sclerosis lesions reveals therapeutic targets effect of dexamethasone in hospitalized patients with covid-19 -preliminary pericyte-specific vascular expression of sars-cov-2 receptor ace2-implications for microvascular inflammation neurologic features in severe sars-cov-2 infection high risk of thrombosis in patients with severe sars-cov-2 infection: a multicenter prospective cohort study conserved cell types with divergent features in human versus mouse cortex sars-cov-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor two-year cognitive, emotional, and quality-of-life outcomes in acute respiratory distress syndrome clinical features of patients infected with 2019 novel coronavirus in wuhan sars-cov-2 detected in cerebrospinal fluid by pcr in a case of covid-19 encephalitis immune responses to stroke: mechanisms, modulation, and therapeutic potential characteristics and outcomes of patients hospitalized for covid-19 and cardiac disease in northern italy nlrp3 inflammasome activation drives tau pathology long-term cognitive impairment and functional disability among survivors of severe sepsis brain mri findings in patients in the intensive care unit with covid-19 infection. radiology neuropathologic features of four autopsied covid-19 patients the circumventricular organs the search for a covid-19 animal model integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain the neuroinvasive potential of sars-cov2 may play a role in the respiratory failure of covid-19 patients infectious etiologies of parkinsonism: pathomechanisms and clinical implications clinical and biochemical indexes from 2019-ncov infected patients linked to viral loads and lung injury venous and arterial thromboembolic complications in covid-19 patients admitted to an academic hospital in critical medical illness and the nervous system neurologic manifestations of hospitalized patients with coronavirus disease pathological inflammation in patients with covid-19: a key role for monocytes and macrophages risk of ischemic stroke in patients with coronavirus disease 2019 (covid-19) vs patients with influenza neutrophil extracellular traps (nets) contribute to immunothrombosis in covid-19 acute respiratory distress syndrome a first case of meningitis/encephalitis associated with sars-coronavirus-2 the hypothalamus as a hub for sars-cov-2 brain infection and pathogenesis incidence, risk factors and consequences of icu delirium large-vessel stroke as a presenting feature of covid-19 in the young covid-19 and the endocrine system: exploring the unexplored clinical characteristics of covid-19 patients with digestive symptoms in hubei, china: a descriptive, cross-sectional, multicenter study lymphopenia and neutrophilia in sars are related to the prevailing serum cortisol central nervous system involvement by severe acute respiratory syndrome coronavirus-2 (sars-cov-2) association of treatment dose anticoagulation with in-hospital survival among hospitalized patients with covid-19 inflammation, autoimmunity, infection, and stroke: epidemiology and lessons from therapeutic intervention covid-19-associated acute disseminated encephalomyelitis (adem) ly6e impairs coronavirus fusion and confers immune control of viral disease. biorxiv steroid-responsive encephalitis in coronavirus disease the spectrum of neurologic disease in the severe acute respiratory syndrome coronavirus 2 pandemic infection: neurologists move to the frontlines magnetic resonance imaging alteration of the brain in a patient with coronavirus disease 2019 (covid-19) and anosmia covid-19-associated acute hemorrhagic necrotizing encephalopathy: imaging features dysregulation of immune response in patients with coronavirus immune complement and coagulation dysfunction in adverse outcomes of sars-cov-2 infection neuropathology of covid-19: a spectrum of vascular and acute disseminated encephalomyelitis (adem)-like pathology psychiatric and neuropsychiatric presentations associated with severe coronavirus infections: a systematic review and meta-analysis with comparison to the covid-19 pandemic physiologic upper limits of pore size of different blood capillary types and another perspective on the dual pore theory of microvascular permeability cell entry mechanisms of sars-cov-2 elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild covid-19 neuropathological features of covid-19 neuroinvasive potential of sars-cov-2 revealed in a human brain organoid model. biorxiv renal histopathological analysis of 26 postmortem findings of patients with covid-19 in china influenza virus associated encephalopathy covid-19: the vasculature unleashed breadth of concomitant immune responses prior to patient recovery: a case report of non-severe covid-19 guillain-barre syndrome associated with sars-cov-2 immunology of covid-19: current state of the science case definitions, diagnostic algorithms, and priorities in encephalitis: consensus statement of the international encephalitis consortium clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in sars-cov-2 invades host cells via a novel route: cd147-spike protein detection of sars-cov-2 in different types of clinical specimens can covid damage the brain? pathological findings of covid-19 associated with acute respiratory distress syndrome sars-cov-2 and stroke in a new york healthcare system a human pluripotent stem cell-based platform to study sars-cov-2 tropism and model virus infection in human cells and organoids encephalitis as a clinical manifestation of covid-19 ly6e restricts the entry of human coronaviruses, including the currently pandemic sars-cov-2 viral load dynamics and disease severity in patients infected with sars-cov-2 in zhejiang province heightened innate immune responses in the respiratory tract of covid-19 patients the authors are supported by nih grants r01-ns34179, r01-ns100447, r37-ns089323, r01-ns095441, r01-ns/hl37853 (ci), r01ns097443 (hk), ns094507, ns081179 (ja), key: cord-321868-xk4yuibj authors: belcourt, michael f.; farabaugh, philip j. title: ribosomal frameshifting in the yeast retrotransposon ty: trnas induce slippage on a 7 nucleotide minimal site date: 1990-07-27 journal: cell doi: 10.1016/0092-8674(90)90371-k sha: doc_id: 321868 cord_uid: xk4yuibj abstract ribosomal frameshifting regulates expression of the tyb gene of yeast ty retrotransposons. we previously demonstrated that a 14 nucleotide sequence conserved between two families of ty elements was necessary and sufficient to support ribosomal frameshifting. this work demonstrates that only 7 of these 14 nucleotides are needed for normal levels of frameshifting. any change to the sequence cuu-agg-c drastically reduces frameshifting; this suggests that two specific trnas, trnaleu uag and trnaarg ccu, are involved in the event. our trna overproduction data suggest that a leucyl-trna, probably trnaleu uag, an unusual leucine isoacceptor that recognizes all six leucine codons, slips from cuu-leu onto uua-leu (in the +1 reading frame) during a translational pause at the agg-arg codon induced by the low availability of trnaarg ccu, encoded by a single-copy essential gene. frameshifting is also directional and reading frame specific. interestingly, frameshifting is inhibited when the “slip” cuu codon is located three codons downstream, but not four or more codons downstream, of the translational initiation codon. establishment of the translational reading frame in eukaryotes occurs during initiation of protein synthesis when the first aug codon in the mrna is located by a component of the scanning 40s initiation complex, trnay* (cigan et al., 1966) . following 60s ribosome assembly, translation continues in 3 nucleotide steps with a very high degree of accuracy, reflecting the fact that maintenance of the translational reading frame is essential for useful gene expression. expression of the nb gene of yeast ty retrotransposons occurs by disruption of this highly accurate translocation mechanism at levels approaching 50% (clare et al., 1988; wilson et al., 1986) . how is this level of "inaccuracy" achieved? the tyl and ty2 elements are members of a family of retrotransposons found dispersed throughout the genome of the yeast saccharomyces cerevisiae (cameron et al., 1979) . they consist of 0.33 kb terminal direct repeats called "delta" (6) flanking a 5.3 kb internal region termed "epsilon" (e). along with retroviruses of higher eukaryotes (reviewed in varmus, 1983 ) the copia-like elements of drosophila species (emori et al., 1985; mount and rubin, 1985) , and llmd of mice (loeb et al., 1986) , ty elements are members of a family of elements that replicate via an rna intermediate. encoded by ty elements are two genes, na and nb. ty elements replicate by a retroviral-like mechanism (boeke et al., 1985) within a virus-like particle encoded by the products of the na gene, the analog of the retroviral gag gene (adams et al., 1987; garfinkel et al., 1965; mellor et al., 1985b) . the nb gene includes sequence homologies to retroviral pal genes, which encode the reverse transcriptase, integrase, and protease proteins. as with many avian and mammalian retroviral pal genes, expression of nb requires ribosomal frameshifting (clare et al., 1988) . tyb, like pal, is expressed as a protein fusion to the product of the upstream gene, na (clare and farabaugh, 1985; mellor et al., 1985a) . while retroviral frameshift events occur in the -1 direction in a region of overlap between the gag, pal and sometimes pro genes (jacks et al., 1987 (jacks et al., , 1988b jacks and varmus, 1985; moore et al., 1987; wilson et al., 1988) , ty frameshift events occur in the +l direction in the 38-44 bp overlap between na and nb. in fact, we recently showed that a 14 bp region of this overlap is necessary and sufficient to promote normal levels of ribosomal frameshifting (clare et al., 1988) . in rous sarcoma virus (rsv), mouse mammary tumor virus (mmtv), and the avian coronavirus infectious bronchitis virus (ibv), rna secondary structure in the form of a pseudoknot plays a critical role in the ribosomal frameshift event, presumably by causing the ribosome to stall at the site of frameshifting (brierley et al., 1989; jacks et al., 1987 jacks et al., , 1988a moore et al., 1987) . the region of frameshifting in ty elements contains no obvious secondary structure. in addition, ty elements lack the characteristic homopolymeric run of nucleotides ("slippery" sequences) at the site of frameshifting that characterizes frameshift sites of retroviruses and coronaviruses (brierley et al., 1989; jacks et al., 1988a) . in these systems, a simultaneous slippage of trnas in the a and p sites of the translating ribosome on the homopolymeric sequence results in a frameshift to the pro orpol reading frame and suppression of the gag frame termination codon. this appears not to be the case in ty elements. an in vivo assay for frameshifting all constructions involve the use of a his4a::lacz fusion gene contained on a 2pm dna-based plasmid whose construction is described in experimental procedures (figure 1 ). this plasmid is present at four copies per cell (farabaugh et al., 1989) . oligonucleotides containing various versions of the frameshift sequence were introduced at the bamhl site of pmb25 and between the bamhl and kpnl sites of pmb38. transcription from the his4 promoter produce8 an mrna with two overlapping genes: the 5' proximal gene derived from the first 100 nucleotides of the his4a gene and, in the +l reading frame, the 3'proximal gene derived from the /acz gene of escherichia coli. production of the /acz gene product, pgalactosidase, depends upon a ribosomal frameshift event in the +l direction within the sequences introduced on the oligonucleotides. the rate of frameshifting is measured by determining the ratio of 8-galactosidase activity produced from a construct requiring a +l frameshift to express /acz to that of a construct in which the upstream and downstream genes are fused in frame. as shown below, the rate of frameshifting measured in this way is high, usually in the range of 40%; however, experimental variation in this rate occurs, both upward to the range of 80% and downward in the range of 20%. this variation results from unknown effects of the sequence context of individual constructions and slight variations in the physiology of yeast transformants. we will describe as abnormal only those transformants showing rates of frameshifting substantially lower than 20% (i.e., in the range of 2% or less). sequence is the site of frameshifting we previously demonstrated that a 14 nucleotide sequence conserved between the tyl and ty2 families of ty elements supports high levels of frameshifting (clare et al,, 1988) . it is possible that the frameshift event occurs shortly before or after the 14 nucleotide sequence since neither the na nor the tyb reading frame is limited by translational termination codons in the region of overlap. in the construction used, the first nb frame termination codon is 72 bp upstream of the overlap in his4a while the first na frame termination codon is 25 bp downstream in /acz. an oligonucleotide was synthesized that contained the 14 nucleotide frameshift sequence flanked by termination codons, upstream in the tyb reading frame and downstream in the na reading frame (table 7) . the oligo-nucleotide was cloned at the bamhl site of pmb25 as described in experimental procedures and analyzed for its ability to promote frameshifting by assaying the expression of the downstream gene, /acz. expression of /acz requires a +l ribosomal frameshift within the sequence between the termination codons. frameshifting is unaffected by the termination codons, occurring at levels of approximately 25% (data not shown). this result demonstrates that the 14 nucleotide sequence is the actual site of frameshifting. an open question in many of the site-specific frameshifting systems described in prokaryotes and eukaryotes is the degree to which the event is directional and reading frame specific. is frameshifting constrained to occur in only one direction, or can it occur in either direction? frameshifting in the human immunodeficiencyvirus (hiv-1) occurs only in the -1 direction (wilson et al., 1988) . conversely, weiss et al. (1987) demonstrated that ribosomes in e. coli may be made to frameshift both forward and backward within a sequence that incorporates a string of identical nucleotides. shifts of -2, -1, +l, +2, +5, and +6 were identified. will frameshifting occur only when the tya reading frame is being translated by the ribosome? changing the reading frame of e. coli frameshift sites decreases the frequency of frameshifting by about loo-fold (weiss et al., 1987) . changing the frame disrupts both a required frame-specific pause induced by a nonsense codon, and a frame-specific trna slippage site. removing the nonsense codon alone causes about a lo-fold decrease in frameshifting. this suggests that reading frame is important in the e. coli case, but not essential. to test explicitly if frameshifting is reading frame specific or directional, we constructed a set of plasmids all of which had a single insert of the minimal region but designed such that translation comes into the region in each of the three reading frames and exits in each of the three reading frames (table 1) . to achieve all combinations required constructing nine plasmids. each minimal region was flanked by termination codons to ensure that any frameshifting that occurred was taking place within the minimal sequence. these constructions were introduced into yeast and assayed for in vivo expression of 8-galactosidase. in three of the plasmids (pmb38-(o)fus, pmb38-(+l)fus, and pmb38-(-l)fus), the his4a and /acz translational frames are fused. two of these constructions give high levels of 6-galactosidase activity (-3000 u; table l ), but the construction in which translation occurs in the -1 frame of the minimal region expresses about 50-fold lower amounts. the latter construction has an in-frame uag termination codon partway through the frameshift region, and thus translation terminates prematurely. two -1 frameshift reporter constructs (pmb38-(0)-l, and pmb38-(+l)o) do not express significant amounts of 6-galactosidase. we conclude that -1 frameshifts do not occur from either the 0 (defined as the wild-type or na reading frame) or +l reading frame. by contrast, construct pmb-38-(o)+l, in which translation enters the minimal region in the 0 frame (tya) and exits in the +l frame (p/b), expresses high levels of 8-galactosidase activity, indicating about 38% frameshifting as expected. frameshifting from the +l reading frame to the -1 reading frame (pmb38-(+1)-l) shows insignificant enzyme activity, indicating that +1 frameshifting does not occur from the +l reading frame of the minimal sequence. to test if frameshifting occurs in the +l or -1 direction from the -1 reading frame, we transformed an amber suppressor strain, l0861, to allow readthrough past the in-frame uag. in this background, the p/a-n/b +l frameshift reporter plasmid, pmb38-(0)+1, promotes about 48% frameshifting (table 1) . frameshifting in the +l direction from the other two reading frames is not seen. likewise, frameshifting in the -1 direction does not occur from any of the reading frames. we conclude from these results that frameshifting in the 14 nucleotide minimal region occurs only in the +l direction and only from the 0, or tya, reading frame. sequence to determine whether all nucleotides of the 14 nucleotide frameshift site are required for frameshifting, a series of frameshift reporter constructs retaining increasingly smaller portions of the minimal sequence were made by synthesizing oligonucleotides that differ in length by sin-gle codons of the tya reading frame. the remaining sequences in each oligonucleotide were flanked by termination codons to ensure that frameshifting occurs only within the sequence. the oligonucleotides were inserted into the h/sa::/aczframeshift reporter plasmid pmb25 at the bamlil site as described in experimental procedures. frameshifting in the +l direction will result in 8-galactosidase production from the /acz gene. table 2 shows the sequence of the deletion constructs as well as the results from each construct. deleting the first two codons from the 3' end of the 14 nucleotide sequence (including 5 nucleotides of the 14 nucleotide sequence) has no effect on frameshifting, demonstrating that only three codons from the 14 nucleotide sequence are necessary to direct frameshifting. no other ty-derived sequences are present, yet frameshifting is occurring at wild-type levels. deletion of an additional codon from the 3' end or deletion of one or two codons from the 5' end results in a much lower frequency of frameshifting ( table 2 ). all three constructs are at least 40-fold lower in expression than the plasmid retaining the first three codons of the 14 nucleotide sequence. since no construction lacking any of the first three codons expresses significant levels of 8-galactosidase, we conclude that the first three codons of the 14 nucleotide sequence, cuu-agg-cca, are necessary to direct frameshifting. this does not mean that each nucleotide is necessary, since the deletions do not allow us to map effects at a resolution of less than one codon. defines 7 essential nucleotides the deletion analysis described above crudely defines the sequence that promotes frameshifting. to define which nucleotides are critical for frameshifting, we synthesized a series of nine oligonucleotides. mixed synthesis was done with the nucleotides corresponding to the three desired mutations at each position of the 9 nucleotide minimal sequence defined above. the three mutations at each position were tested for their effect on frameshifting after cloning into the his4a::laczframeshift reporter plasmid pmb38 between the bamhl and kpnl sites as described in experimental procedures. the results are depicted graphically in figure 2 . the nucleotide immediately 5'of the 9 nucleotide minimal sequence is unimportant for frameshifting, as it can be any of the four nucleotides. the eighth and ninth nucleotides can also be changed to any base without affecting frameshifting. however, any change to the remaining 7 nucleotides drastically reduces frameshifting. the essential bases, cuu-agg-c, include two codons of the na reading frame plus a seventh base. as can be seen from closer inspection of the data, some changes allow lower levels of frameshifting that are above background. c or g substitutions for u at the wobble position of the first codon allows approximately 2% frameshifting. substitution of c or u for a at the first position of the second codon and replacement of g with u at the wobble position of the second codon also allow lower levels of frameshifting. the reason for the necessity of the seventh base is unknown but can probably be attributed to a context effect on decoding of the preceding codon. context effects are known to involve the nucleotide immediately 3'of a codon (bossi, 1983; bossi and roth, 1980; carrier and buckingham, 1984; fluck et al., 1977; miller and albertini, 1983; murgola et al., 1984; weiss, 1984; weiss and gallant, 1988) . occurs when peptidyl-trnabu is bound to the cuu codon and does not involve simultaneous slippage the 7 nucleotide minimal sequence defined above includes two unusual features. first, there are overlapping leucine codons (cuu and uua) in the 0 and +l reading frames. in nearly all organisms these two codons are decoded by distinct isoaccepting species. yeast is unusual in that it encodes a trna, trnaf& (previously known as trna:b") that decodes all six leucine codons (weissenbach et al., 1977) . the expanded recognition requires an unmodified uracil at the wobble position of the anticodon (randerath et al., 1979) . frameshifting could involve slippage of this trna between the two leucine codons. second, the next codon, agg (arg), is recognized by a low-abundance trna encoded by a single nuclear gene, trn&?,,. deletion of this gene is lethal to yeast (gafner et al., 1983) . two models can account for the necessity of the cuu shown boxed on the x-axis is the wild-type sequence of the frameshift site. the hatched portion indicates the essential nucleotides for frameshifting. above each boxed nucleotide are the miasense substitutions generated for each position of the sequence. the percent frameshifting for each substitution is shown graphically. if ty elements employ a peptidyl-trna slippage method of frameshifting as described in the text. a partial rna sequence from the aug codon is provided for reference. (b) predicted amino acid sequence through the 7 nucleotide frameshift site (amino acids shown numbered) if ty elements employ the simultaneous-slippage method of frameshifting seen in retroviruses of higher cells. a partial rna sequence from the aug codon is provided for reference. (c) amino acid sequence datathrough the 7 nucleotide frameshift site. histograms of relevant pth-amino acids through 11 cycles of edman degradation are shown with the major amino acid present in each cycle indicated above the histograms and above the mrna sequence in (b). cycle 8, which is predicted to yield his according to the rna sequence, gave a weak signal, and no assignment was made. no other amino acid can be detected. the 14 nucleotide frameshift site has a gln residue (rather than his) following gly. the amino acid sequence through the 14 nucleotide site confirms the gln residue after gly (data not shown), supporting the notion that translation shifts to the +l reading frame by peptidyl-trna slippage on the cuu-leu codon. and agg codons. in rsv, hiv-l, mmtv, ibv, bovine leukemia virus (blv), and the simian retrovirus type 1 (srv-l), a simultaneous slippage of two trnas causes a -1 frameshift (jacks et al., 1988a (jacks et al., , 1988b wilson et al., 1988) . the model requires that both the a and p sites be occupied simultaneously to accomplish the frameshift. this could be the case for ty, yet only the simultaneous slip would be in the +l direction from cuu-agg to uuaggc. this would predict a protein sequence of leu-arg-his through the frameshift site ( figure 38 ). another model predicts that the ribosome encountering the agg codon pauses because of the low availability of trna$&. in this state, with peptidyl-trna& in the p site and a vacant a site, the peptidyl-trna$ slips from the cuu reading frame to the uua (+l) reading frame. this places the ggc (gly) codon in the a site. trna& enters the a site and translation continues in the +l reading frame. this model predicts a protein sequence of leu-gly-his ( figure 3a ). to distinguish between the two above scenarios as well as other models, we determined the peptide sequence through the frameshift site. the his4a::lacz fusion plasmid p3p has a bamhl site incorporated after the first two codons of the hi.%4 gene. the construction is described in experimental procedures and depicted in figure 3 . the 9 nucleotide frameshift site, flanked by termination codons, was cloned at the bamhl site such that /acz expression requires a +l frameshift within the minimal sequence. after purification of the his4a::lacz fusion product, the protein was subjected to 14 cycles of edman degradation. the relevant pth-amino acids through the first 11 cycles are shown in figure 3c . the data clearly support the trnaktg slippage model. glycine rather than arginine is incorporated after leucine, indicating that the frameshift event occurs on the cuu leucine codon. no significant amount of arginine is detected at cycle 7, ruling out a cuu-agg simultaneous-slippage model of frameshifting. these data do not rule out simultaneous slippage on cuu and the codon immediately upstream of it; this is very unlikely since that codon is not from the ty overlap, and can be changed at any position with no effect on frameshifting. therefore, the most likely model is one involving slippage of a single trna bound to the cuu codon. occurs when the agg codon is unoccupied by its cognate tfina a correlation exists between the abundance of yeast trnas and the occurrence of their respective codons (ikemura, 1982) . the second codon (agg) of the frameshift site is recognized by a low-abundance trna encoded by a single gene. as expected, the codon agg is rare in yeast genes (aota et al., 1988) . the fact that the rate of aminoacyl-trna binding to the ribosomal a site is proportional to its concentration (thompson et al., 1980) has been taken to mean that codons recognized by abundant trnas are decoded quickly, while codons recognized by nonabundant trnas are decoded slowly. recent evidence suggests that translation rate is not strictly correlated with trna concentration, since some nonabundant trnas are decoded more quickly than some much more abundant trnas (bonekamp et al., 1989) . though the rule that all low-abundance trnas decode slowly may not be universal, it is true that some codons are very slowly decoded. interestingly, the trna&u of e. coli, which is also a rare trna, has a low intrinsic decoding rate (bonekamp and jensen, 1988) . it is possible that the very low abundance of trna@u is necessary to promote frameshifting by causing a translational pause analogous to the function of the nonsense codon and "hungry"codons in e. coli frameshifts (weiss et al., 1987 . if this is true, then it may be possible to modulate frameshifting by modulating the amount of the rare trna. in particular, increasing the in vivo concentration of the trna should decrease the putative pause, and thus decrease the rate of frameshifting. the gene for trna@u was cloned onto a frameshift reporter plasmid, pmbs&smerwt, as described in experimental procedures. the copy number of this plasmid is approximately four per cell, increasing the total number of copies of the trna$!& gene to five per cell (including the one endogenous copy). the plasmid contains the h/s4a-9 nucleotide frameshift site-lac;! reporter gene, which normally shows 40% frameshifting (table 3 ). the 5-fold increase in copy number of the trna& severely inhibited frameshifting (table 3 ). the fact that increasing the concentration of trna& causes frameshifting to decrease is incompatible with any model in which the frameshift event is stimulated in part by binding of trna& to the ribosomal a site. rather, trna& must "act" by being absent from the ribosome since increasing its ability to occupy the a site has a negative effect on frameshifting. we conclude that competition for the rare trna& induces a translational pause that is essential for the frameshift event and that frameshifting occurs when the ribosomal a site is unoccupied. depends upon trna slippage between the cuu and uua leucine codons the missense mutagenesis data demonstrate that the cuu codon is essential for frameshifting ( figure 2) . we wanted to test directly if frameshifting depended upon frame slippage by a peptidyl-trnaleu. if trnakg promotes frameshifting by frame slippage, then the rate of frameshifting should be directly related to the frequency that that trna is used to decode the cuu codon. if another isoaccepting species, incapable of frame slippage, were to compete with trna$ for binding to the cuu codon, frameshifting would be reduced. have proposed a different model for ribosomal frameshifting at "hungry" codons in e. coli. the model predicts that in a situation where the a site is unoccupied during a very long pause, a trna can bind out of frame, either +l or -1, in violation of the normal frame maintenance mechanism. if this model were to operate in the ty case, one would see the same peptide sequence at the frameshift site that we determined, leu-gly-his. however, the ability of the peptidyl-trnaleu to slip would be irrelevant. to test the model of peptidyl-trnaleu slippage, we constructed three novel trnas bearing modified anticodons by modifying the anticodon of a gene for the most abundant isoacceptor, trna&, also known as trnap (see experimental procedures). the first, with the anticodon aag (trna&), can decode cuu but is incapable of recognizing the overlapping uua codon. the second has the anticodon uag (termed trna&, to distinguish it from the wild-type trna$), and the third is an ochre suppressor with the anticodon uua (trna:e,",). the latter two serve as controls since neither is expected to interfere with frameshifting. changing the anticodon should have no effect on charging of the trna by leucine aminoacyl-trna synthetase for two reasons. first, leucine is encoded by six codons, the only common feature of which is the central u residue. second, functional amber and ochre suppressor forms of trnaleu exist (sherman, 1982) . the trna genes were cloned into the plasmid pmb38-9merwt (see previous section) and introduced into yeast. the rate of frameshifting was drastically lower in the presence of trna!$ but not with trna&&, while trna& had no effect (table 3 ). the 43-fold decrease in frameshifting in the presence of trnafa; suggests that it has competed away 98% of the binding of a trna responsible for frameshifting. although the data do not directly identify trnapig as the "shifty" trna, they do demonstrate that the ability of the trna decoding the cuu codon to slip to uua is essential for ty ribosomal frameshifting. other codons decoded by rare trnas cannot substitute for agg if the agg codon of the frameshift site induces a translational pause as a result of the low availability of its cognate trna, is it possible to induce a translational pause by substituting a codon recognized by other low-abundance trnas? we replaced the agg codon with two other codons known to be recognized by trnas present in low concentration: cgg, decoded by trna&o (m. culbertson and i. edelman, personal communication); and ucg, decoded by trnap$, (etcheverry et al., 1982; olson et al., 1981) . like the trna&u gene, the trna&jo and trna&* genes are present in single copies in the haploid yeast genome and are lethal to the cell if deleted, reflecting the fact that no other trnas are capable of decoding the cgg or ucg codons. oligonucleotides containing cgg or ucg at the agg position of the frameshift site were cloned into the frameshift reporter plasmid pmb38 as described in experimental procedures. after introduction into yeast, the constructs were analyzed for 8-galactosidase production. both produced very low levels of enzyme, corresponding to 1.8% frameshifting for cgg and 0.3% frameshifting for ucg ( table 4 ). the cgg construct does support a significant level of frameshifting but still is about 25fold below that of the wild-type construct with identical flanking sequences. a possible problem with this experiment is the elimination of the uua leucine codon in the +l reading frame. perhaps trna$& cannot slip from cuu to uuc or uuu, both phenylalanine codons recognized by a single isoacceptor, trng'&, also known as trnaphe (rajbhandary et al., 1987; valenzuela et al., 1978) . to control for this, oligonucleotides were constructed containing the same substitutions for agg as above, but with only the first position c of the cuu codon changed to u. we reasoned that trn@& should be able to shift from uuu to uuu or uuc of the +l reading frame of each construct (table 4) because the hiv-1 frameshift event can occur in the -1 direction on a string of 8 uracil residues in yeast (wilson et al., 1988) . as can be seen from the data, this change did not improve the ability of the sequences to support frameshifting (table 4) . a more controlled experiment utilizes a well-characterized set of trna& gene deletion constructs. the uay codon is decoded by an abundant trna (trna&; also called trnatyr) encoded by eight unlinked genes in the haploid yeast genome (olson et al., 1977) . burke (1988) demonstrated that the sequential deletion of each of the trna@,, genetic loci results in a linear decrease in the pool of aminoacyl-trnatyr within a cell. deletion of up to four of the genes has no detectable phenotype; however, deleting five results in a 50% increase in doubling time, deleting six increases doubling time by 800/o, and deleting seven is lethal. the increase in doubling time unambiguously indicates that cell growth is limited by the availability of the trna, suggesting that competition for the trna would be severe in at least those strains. we introduced a frameshift reporter plasmid into strains containing two to eight copies of the trn@& genetic loci. this plasmid contained one of three versions of the 9merwt frameshift site: one with the wild-type sequence (cuu-agg-cca), one substituting a uau tyrosine codon for agg, or one substituting the codons uuu-uau for cuu-agg (table 5 ). the second construct would require trna& to slip from cuu (leu) to uuu (phe). to make a slip more likely, the last construct requires trna& to slip from the 0 frame uuu codon to the +l frame uuu codon to express /acz. we reasoned that as the aminoacyl-trna&, pool drops with decreasing copies of the trna&, gene, the possibility of a translational pause at the uau codon would increase because of the apparent severe competition for the trna. this pause may allow a frameshift event to occur at the upstream codon. the results are displayed in table 5 . neither construct incorporating a uau "pause" codon allows appreciable levels of frameshifting in any of the deletion strains, while the wild-type construct shows normal amounts of frameshifting. we conclude from these data that the fact that a codon is decoded by a limiting trna in itself is not sufficient to cause a translational pause capable of inducing detectable translational frameshifting. is inhibited by proximity to the initiation codon we previously demonstrated that placing the 14 nucleotide frameshift site in immediate proximity to the translational initiation codon inhibits frameshifting (clare et al., 1988) . placing the cuu codon of the frameshift site three codons downstream of the ty2-917 aug codon abolished frameshifting. when this codon was positioned 31 codons downstream, frameshifting was restored. this suppression could be a context effect around the ty2-917 initiation codon (in which case the sequence around the initiation codon would suppress frameshifting even if positioned far downstream within a gene), or it could be an effect of initiation itself. to test the former possibility, we synthesized an oligonucleotide encompassing a region from 15 bp upstream of the ty2-917 aug, through the 14 nucleotide frameshift site (which was fused at the ty2-917 aug), and 15 bp downstream. the oligonucleotide, when inserted between a na917deletion 929 bp downstream of the initiation codon and the /acz gene (in the +l reading frame), supported normal levels of frameshifting (data not shown). this suggests that proximity alone causes the suppression of frameshifting. to determine the minimal distance between the aug initiation codon and the 14 nucleotide frameshift site, we placed the frameshift site at increasingly farther codon intervals from the hem initiation codon. bamhl sites were introduced in the his4a gene at codon intervals beginning at the second codon and ending at the seventh codon as described in experimental procedures. introduction of the 14 nucleotide frameshift sequence into each of the six "proximity constructs" allowed a measure of frameshifting into the downstream /acz gene. as we saw with the ty2-917 construct, placement of the cuu codon of the frameshift site three codons downstream of the his4 aug in construction p2p resulted in no frameshifting (table 6) . however, construction p3p, which places the frameshift site at a position four codons downstream, shows appreciable levels of frameshifting, with frameshifting occurring at about 70/o. placement of the site five, six, seven, or eight codons downstream of the aug codon showed frameshift activity at near wild-type levels. we hypothesize that the ribosome becomes competent to frameshift during the initial stages of elongation following initiation, apparently with an abrupt transition after the fourth codon of the gene. translation initiation in eukaryotes occurs by a mechanism fundamentally different from that in prokaryotes. the ribosome binds to the 5' end of the mrna and "scans" to the initiation codon, usually the first aug codon in the message. the aug codon acts as a starting point for translation and establishes the reading frame of the message. disruption of the reading frame occurs only rarely during elongation, but several instances of reading frame shifts have been observed. the phages f2 and ms2 (beremand and blumenthal, 1979; kastelein et al., 1982) express their lysis genes through a mechanism that involves frameshifting within the upstream coat cistron. spontaneous readthrough of leaky frameshift mutations occurs within the 0x0 gene of yeast mitochondria (fox and weiss-bummer, 1980) . the addition of a carboxy-terminal extension onto the phage t7 major coat protein also occurs by a frameshift during translation (dunn and studier, 1983) . synthesis of the e. coli peptide release factor 2 (rf2) requires a +l frameshift at an in-frame uga codon (craigen and caskey, 1986; craigen et al., 1985) , while some retroviruses and coronaviruses of higher eukaryotes employ a -1 frameshift to express the products of their pal (or pro) and f2 genes, respectively (brierley et al., 1987 (brierley et al., , 1989 jacks et al., 1987 jacks et al., ,1988a jacks et al., , 1988b jacks and varmus, 1985; wilson et al., 1988) . frameshifting in the two best-studied systems, the rf2 gene of e. coli and retroviruses of higher eukaryotes, have several unifying features. codon 26 of the rf2 gene of e. coli is an in-frame uga termination codon, one of two codons recognized by rf2 (scolnick and caskey, 1969) . the uga codon is probably the site of an autogenous control system. instead of prematurely terminating at the uga codon, 50% of the ribosomes shift into the +l frame at the position of the cuu (leu) codon immediately preceding the uga (craigen and caskey, 1986; craigen et al., 1985) . in an exhaustive study, weiss et al. (1987 have defined the minimal requirements of frameshifting in e. coli. they find that the three things necessary for frameshifting to occur are a "slippery run:'or repeat of several identical nucleotides in the rna, followed immediately by a termination codon and preceded by a shine-dalgarno sequence (shine and dalgarno, 1974) a precise distance away. constructs were made that frameshifted -2, -1, +l, +2, +5, and +6 with varying efficiency. the proposed mechanism involves binding of a trna in the "slippery run." a translational pause, induced by the slow process of termination at the nonsense codon, allows an interaction between the shine-dalgarno-like sequence upstream of the frameshift site and the 16s rna. the interaction "pulls" or "pushes" the ribosome, causing the trna to slip along the slippery run. translation then continues in the new frame. the mechanism of -1 frameshifting in retroviruses has been shown to involve both a specific "slippery" sequence and, in some cases, a conserved secondary structure. jacks et al. (1988a) showed that three subsets of repetitive nucleotides conserved among retroviruses constitute the sites of frameshifting. protein sequence analysis and sitedirected mutagenesis of the gag-pro fusion junction from mmtv (jacks et al., 1987 ) hiv-1 (jacks et al., 1988b wilson et al., 1988) and rsv (jacks et al., 1988a) demonstrated that the event occurs at these repetitive sequences: in mmtv at the sequence a-aaa-aac (shown as gag frame codons), in hiv-l at the sequence u-uuu-uua, and in rsv at the sequence a-aau-uua. similar sequences from ibv, blv, and srv-1 can also support frameshifting (jacks et al., 1988a) . most retrovirus frameshift events also require a conserved pseudoknot structure located immediately downstream of all retroviral frameshift sites (jacks et al., 1988a) , a requirement also seen in ibv frameshifting (brierley et al., 1989) . notably, hiv-1 does not require its pseudoknot structure for frameshifting (jacks et al., 1988a; wilson et al., 1988) . this pseudoknot structure may be required to induce a translational pause (jacks et al., 1988a) . like the translational pause in the e. coli case, the pause induced by the pseudoknot allows frameshifting to occur on the slippery sequence. unlike e. coli, frameshifting appears to involve two trnas that recognize two codons of the slippery sequence in the gag reading frame. jacks et al. (1988a) showed that the frameshift event involves a simultaneous slippage of these trnas in the ribosomal a and p sites. because certain changes are allowed in the a and p site codons and because several different sites exist (see above), more than one trna (termed "shifty" trnas) is competent to frameshift. frameshifting in ty elements initially appeared to lack all of the features identified in other frameshift sites. no homopolymeric string of nucleotides is necessary in the region of the frameshift event. frameshifting requires neither a nearby termination codon nor a downstream rna secondary structure to induce a translational pause. previously, we showed that a 14 nucleotide sequence is necessary and sufficient to induce ribosomal frameshifting even when placed within a completely heterologous context, the his4 gene of yeast (clare et al., 1988) . rna sequencing of an mrna containing this frameshift site demonstrated that the dna and rna sequences are exactly colinear, ruling out any pretranslational mechanism. in this report we show that frameshifting occurs within the 14 nucleotide sequence. since the sequence is flanked with termination codons, upstream in the p/b reading frame and downstream in the ty4 reading frame, frameshifting is confined to occur within the window between the termination codons. expression of the downstream lacz gene, in the +l reading frame, is unaffected by the termination codons. we show that the frameshift event is directional and reading frame specific. three different constructs allow translation to occur into each of the three reading frames of the frameshift site. for each construct, the /acz gene is fused downstream in all three reading frames. this allows an assay for frameshifting from each of the reading frames in either the +l or -1 direction. translation into the frameshift sequence in the 0, or p/a, reading frame (the normal reading frame of the frameshift site) allows wild-type levels of frameshifting in the +l direction. however, translating the +l or -1 reading frames allows no frameshifting in the +l direction, demonstrating the reading frame specificity of frameshifting. frameshifting does not occur in the -1 direction from any of the three reading frames, indicating that frameshifting is directional. a model for ty frameshifting deletion analysis and missense mutagenesis define 7 nucleotides essential for frameshifting. this sequence, cuu-agg-c, has some interesting characteristics. over-lapping leucine codons, cuu in the 0 frame and uua in the +l frame, are recognized by a single isoacceptor in yeast, trna&. this trna can recognize all six of the leucine codons by virtue of an unmodified uracil in its wobble position (randerath et al., 1979; weissenbach et al., 1977) . modified uracil residues recognize only purines, whereas an unmodified uracil can recognize all four bases (heckman et al., 1980; sibler et al., 1988) ; unmodified uracil residues (and consequently, recognition of four nucleotides) is common in mitochondrial trnas ( barrel1 et al., 1980; bonitz et al., 1980; heckman et al., 1980; sibler et al., 1988) , where a much smaller number of trnas are sufficient to translate all the codons of the mitochondrial genome. this characteristic makes this trna an ideal candidate for a shifty trna, as it is a major isoacceptor for leucine (present at 48% the level of the most abundant isoacceptor, trnacaa, leu * ikemura, 1982) and satisfies wobble rules in both reading frames. the second unique characteristic of this sequence is the agg codon. this codon is recognized by a low-abundance trna, trna$'u, a single-copy essential gene. these two characteristics suggest a model for frameshifting that retains features of frameshift sites in other systems. a slippery sequence, cuu in the 0 frame to uua in the +l frame, is possible because of the unique trna&. a translational pause could be induced by the low-abundance trna'$$,. pausing and frameshifting at "hungry" codons, codons whose cognate trna is in short supply, occurs in other systems (spanjaard and van duin, 1988; and, in ty elements, may satisfy the apparent requirement for a translational pause to induce frameshifting. the model, diagrammed in figure 4 , states that ribosomes encountering the agg codon pause because of the low availability of trna$u. peptidyl-trna&e, in the p site of the ribosome (on the cuu codon), slips forward +l during the pause onto the uua codon. translation continues in the +l reading frame. we have provided experimental evidence to support this hypothesis. peptide sequencing through the frameshift site indicates frameshifting at the cuu codon, eliminating the simultaneous-slippage model of frameshifting seen in retroviruses. overproduction of trna$u decreases frameshifting approximately 43-fold, presumably by eliminating the translational pause at the agg codon. since trna'& is now abundant in this background, translation continues in the 0 frame and terminates immediately downstream. this suggests that a translational pause is an essential component of the frameshift event in ty as it seems to be in other systems. significantly, a high copy number suppressor of ty transposition maps to the trna$ gene, a result that directly supports the importance of frameshifting in the expression of the p/b gene and hence in transposition (h. xu and j. d. boeke, personal communication) . a second result implicates trna$o as the slippery trna. since this trna may be the only trna that can decode cuu, we reasoned that overproduction of a second trna that decodes cuu exclusively would compete with trna/-& and reduce frameshifting. this second trna would not be expected the first step necessitates recognition of the cuu codon by trna&. owing to the low availability of trna'&, a translational pause at the agg codon allows time for a frameshift to occur at steo 2. the commitment to frameshifting occurs at step 3 when the next +1 frame codon is recognized by its cognate trna. to be able to slip onto uua. changing the anticodon of trna&, to create trna& and overproducing this trna results in a 43-fold inhibition of frameshifting. clearly, this result demonstrates the importance of trnafze in the frameshift event. as indicated by the missense mutagenesis data, some changes in the 7 nucleotide sequence allow frameshifting at low levels. these changes allow frameshifting at levels of approximately 20-to 30-fold below that of the wild-type sequence. it is difficult to ascertain why frameshifting can occur in some of these cases; however, the two cases showing the highest levels of frameshifting may share some features with the wild-type sequence. the a to c change at the first position of the second codon produces a cgg arginine codon in the 0 frame and a uljc phenylalanine codon in the +l frame. cgg is a codon recognized by a rare trna, trna&. a possible reason why this sequence does not support higher levels of frameshifting is seen in the codon usage pattern for cgg. cgg appears at a level 7-fold below that of agg in yeast genes (aota et al., 1988) . the lower demand for trna$zo could reduce the length of the translational pause needed for frameshifting. this should result in a lower proportion of frameshift events occurring before translation continues by insertion of arginine at the in-frame cgg codon. the g to u change at the third position of the second codon retains the cuu to uua slippery sequence but changes the agg-arg codon to agu-ser. agu must also allow a shorter pause; the trna that decodes it has not been identified, so we do not know if it is a rare or abundant trna. thus far, attempts to create a frameshift site by replacing the ago codon with other codons recognized by nonabundant trnas have failed. an experiment that definitively tests this hypothesis uses a family of strains retaining from eight to only two of the trna$& genes, the expectation being that when the cell is limited for trna decoding the two tyrosine codons, competition for the aminoacyl-trna will be severe and a substantial translational pause should result. the fact that this, too, failed forces us to conclude either that competition for the trna&, is especially severe or that the translational pause induced by competition for the trna is not the only function of the agg codon, the second role not being satisfied by the other codons tested. a more exhaustive search for possible substitute "pause" codons may determine which possibility is correct. remarkably, frameshifting in ty elements is inhibited by the proximity of the frameshift site to the translational initiation codon. this phenomenon was first observed by placement of the 14 nucleotide frameshift site within four codons of the initiation codon of ty2-917 (clare et al., 1988) . frameshifting was completely inhibited. similarly, placement of the site the same distance from the i-/ma initiation codon inhibits frameshifting. addition of another codon between the initiation codon and the frameshift site restores frameshifting. this proximity effect has not been observed or examined in other systems and may be a general phenomenon. we hypothesize that translation immediately after initiation might be fundamentally different from that during the elongation phase. this difference could perturb the ribosome's frameshifting ability. experiments are under way to understand what elements determine the ribosome's ability to frameshift early in elongation. yeast strains, media, and general methods the s. cerevisiae strain used for this work unless otherwise indicated is 337-1d (a hi&a38um3-52 trpl-289 ho/j-l). analysis of directionality plasmids was also done in strain lo361 (a his&z8 lysl-1-o met8-l-am sup43 ura3-52) kindly provided by gerald fink. this strain carries a temperature-sensitive amber suppressor, suppressing at 3tx but not at 37%. analysis of tyrosine codon substitutions was done in strains kindly provided by maynard olson. the strains are deleted for one to six copies of the eight trna& genetic loci (sup2-3, 71) and have the following genotype: maw or a a&2-l-o /ys2-7-o trp52-o leul-12 canlloo-0 ~~3-1 met4-l-o (burke, 1988) . strain abl143 is deleted for sup8; the plasmid pmb38 is shown in figure 1 . this plasmid contains the ura3 gene and the 2um origin of replication from plgsd5 (guarente et al., 1982) and a struncated /acz gene from pmc1790 (casadaban et al., 1983) ; the construction of this portion of the plasmid has been described (liao et al., 1987) . the his4.a gene was introduced as a 1.2 kb sall-bamhi fragment between the xhol and bamhl sites of pnoup (liao et al., 1987) . replacement of the bamhi-sacl laczfragment with the complementary fragment containing the bamhi-kpnl linker at the 5'end of /a&creates pmb38. the laczgene is fused via a bamhi-kpnl linker to the his4a gene 33 codons downstream of the hi.%4 initiation codon. missense, codon substitution, and directionality oligonucleotides containing the 9 bp and 14 bp frameshift sites were introduced between the bamhl and kpnl sites as described by derbyshire et al. (1986) . the sequences of directionality oligonucleotides are shown in table 7 . missense and codon substitution oligonucleotides have the following general sequence: sgatccgctgacactt~gcca-tgaggtac-3'. this oligonucleotide contains 5'and 3'overhangs complementary to bamhl and kpnl half-sites, respectively. this allowed cloning into pmb38 as described above. the underlined region was subjected to missense mutagenesis as described in the text or changed to specific nucleotides as described in the section on codon substitutions (see table 7 ). an oligonucleotide identical to this except for an addition of 2 nucleotides following the underlined sequence fuses the upstream and downstream reading frames in frame. this oligonucleotide (called smer-fusion; table 7 ) allows for quantitation of the level of frameshifting. top and bottom strands of flanking terminator and codon deletion oligonucleotides, depicted in table 7 , were annealed and cloned into the bamhl site of pmb25 as described in clare et al. (1988) . pmb25 is identical to pmb38 except that it lacks the kpnl site at the junction of h/wa and lacz. construction of "proximity" mutations in the his4a gene was done by utilizing the polymerase chain reaction (pcr). an oligonucleotide, sal-upstream (see table 7 ) of which the 3' 20 nucleotides are complementary to the noncoding strand of the his4.a gene, primes synthesis of the coding strand beginning 708 nucleotides upstream of the protein coding region (donahue et al., 1982) . downstream, six oligonucleotides, of which the 3'20 nucleotides are complementary to the coding strand, prime synthesis of the noncoding strand at codon intervals immediately after the initiation codon. the sequences of these oligonucleotides (called 2p-7p) are listed in table 7 . pcr reactions were done as described by the manufacturer (cetus corporation) on plasmid psal1, which contains the complete his4 gene. the upstream oligonucleotide incorporates a sall restriction site with the downstream oligonucleotides incorporating a bamhl site. the pcr-generated frag ments were digested with sall and bamhl and cloned into thexhol and bamhl sites of pnoup. subsequently, the bamhi-sacl laczfragment of pnoup was replaced by either the bamhi-sacl /acz fragment of pmbs&smervvt or the bamhi-sacl /acz fragment from pty1812al225-co (clare et al., 1988) to incorporate the 9 bp and 14 bp frameshift sites, respectively. the plasmid used to overexpress trna& was pmb3e9merwt this plasmid contains the missense oligonucleotide depicted above with the wild-type 9 nucleotide frameshift site. the unique tthllll site was blunted with dna polymerase i large fragment and joined to sall 8-mer linkers (new england biolabs). the plasmid hi3 (gafner et al., 1983 ) the kind gift of peter philippsen, contains a 2.15 kb xhol fragment carrying the gene for trna!&. this xhol fragment was cloned into the sall site of pmb38-9merwt to create the plasmid pmb36-smerwt-trna$$ le" the trna genes encoding trna& trna,,,, and trna,u, "" were overexpressed on the same pmb38-9merwf plasmid. a gene for trna& is found immediately upstream of the leup gene. pcr was performed on a plasmid bearing the 2.2 kb xhol-sall fragment carry ing leup using the sal-downstream and enx-upstream primers (see table 7 ). the product was a 548 bp fragment with ecori, ndel, and xhol sites at the upstream end of the trna fragment and a sall site at the downstream end. ndel-and sallcleaved fragment was cloned into pucl3 between the unique ndel and sall sites. the resulting clone lacks the unique narl site of puc13, but introduces a unique narl site in the insert starting 11 bp upstream of the anticodon of trnalga. to construct the three codon replacements of trnal&, we adapted a pcr procedure in which a whole plasmid is amplified using two adjacent, divergent primers (hemsley et al., 1989) . the narl-opposite strand primer (table 7) primes synthesis from the unique narl site in the direction away from the trnalga anticodon. three primers incorporating the novel anticodons, the aag, uag. and aau primers (table 3) prime synthesis from the narl site in the opposite direction. the product is a linear form of the plasmid with narl sites at either end carrying the novel anticodons. the product is digested with narl and recircularized to recover the plasmids bearing the novel trna genes. the xhol-sal1 fragment encompassing each gene was transferred into the sal1 site of pmb38-9merwt and isolates of each in which the trna gene is oriented pointing toward the 3'end of the /acz gene of the plasmid were selected. these plasmids are pmb38-smerwt-trna&& pmb38-smerwt-trnat&, and pmb3&9merwt-trna& pgalactosidase purification and protein sequencing the protocol for p-galactosidase purification from yeast was provided by robert weiss and is essentially as described (weiss et al., 1989) . yeast strain 387-ld, transformed with plasmid p3p-9merwt. was grown at 30% in 18 liters of sd minimal medium supplemented with 20 mgll of histidine and tryptophan and 2% glucose. at a culture density of ~'&i = 1, the cells were pelleted, washed with one-fourth volume of buffer a (50 mm tris-hci [ph 7.41, 150 mm naci, 5 mm edta. 0.1% tween 20, 10 mm &mercaptoethanol, and 0.5 mm pmsf), and then resuspended in the same buffer (3 ml of buffer per gram of cells). cells were broken using 10 g of 400 urn glass beads (sigma) per 4 ml of suspension by vortexing for 4 min with 1 min incubations on ice between each minute of vortexing. cell debris and beads were pelleted by centrifugation at 2000 x g for 15 min. the resulting supernatant was then centrifuged for 90 min at 100,000 x g in a beckman tlloo centrifuge to pellet organelles. a clear supernatant was collected and passed over an anti+-galactosidase immunoaffinity column (protosorb, promega biotec) equilibrated with buffer a. the eluate was concentrated with a centricon 30 centrifuge chamber (amicon), washed four times with hplc-grade water, and then placed in the cartridge of an applied biosystems 475a protein sequencer equipped with an on-line applied biosystems 120a high performance liquid chromatography analyzer. !3-galactosidase assay and dna sequencing three transformants of each plasmid were each assayed in triplicate for f%galactosidase activity as described (farabaugh et al., 1989) . all constructionsdescribed in this paper were sequenced by the chain termination technique (sanger et al., 1977) from plasmids prepared as described by mierendorf and pfeffer (1987) . we are grateful to robert weiss for providing us with the protocol for j%galactosidase purification from yeast and for sequencing the frameshift protein. we thank i? philippsen, m. olson, and g. fink for providing the clones and yeast strains indicated above. this work was supported by u.s. public health service grant gm 29480. the costs of publication of this article were defrayed in part by the payment of page charges. this article must therefore be hereby marked "advertisement" in accordance with 18 u.s.c. section 1734 solely to indicate this fact. received april 6, 1990; revised may 9, 1990. the functions and relationships of ty-vlp proteins in yeast reflect those of mammalian retroviral proteins codon usage tabulated from the genbank genetic sequence data different pattern of codon recognition by mammalian mitochondrial tanas ty elements transpose through an rna intermediate the agg codon is translated slowly in f. co/i even at very low expression levels codon recognition rules in yeast mitochondria context effects: translation of uag codon by suppressor trna is affected by the sequence following uag in the message the influence of codon context on genetic code translation an efficient ribosomal frameshifting signal in the polymerase-encoding region of the coronavirus ibv characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot molecular genetics of yeast chromosomes evidence for transposition of dispersed repetitive dna families in yeast an effect of codon context on the mistranslation of ugu codons in vitro 6galactosidase gene fusions for analyzing gene expression in escherichia co/i and yeast trnay' functions in directing the scanning ribosome to the start site of translation nucleotide sequence of a yeast ty element: evidence for an unusual mechanism of gene expression efficient translational frameshifting occurs within a conserved sequence of the overlap between the two genes of a yeast tyl transposon structure of yeast phenylalanine-trna genes: an intervening dna segment within the region coding for the trna retroviruses. in mobile genetic elements molecular model of ribosome frameshifting frameshift suppression in aminoacyl-trna limited cells slippery runs, shifty stops, backward steps, and forward hops: -2, -1, +l, +2, +5, and +6 ribosomal frameshifting. cold spring harbor symp reading-frame switch caused by basepair formation between the 3'end of 16s rrna and the mrna during elongation of protein synthesis in escherichia co/i f. co/i ribosomes re-phase on retroviral frameshift signals at rates ranging from 2 to 50 percent on the mechanism of ribosomal frameshifting at hungry codons yeast trnalbu (anticodon uag) translates all six leucine codons in extracts from interferon treated ceils expression strategies of the yeast retrotransposon ty: a short sequence directs ribosomal frameshifting hiv expression strategies: ribosomal frameshifting is directed by a short sequence in both mammalian and yeast systems craigen, w. j.. and caskey, c. t. (1986) . expression of peptide chain release factor 2 requires high-efficiency frameshift. nature 322, 273-275.craigen, w. j., cook, r. g.. tate, w. p., and caskey, c. t. (1985) . bacterial peptide chain release factors: conserved primary structure and possible frameshift regulation of release factor 2. proc. natl. acad. sci usa 82, 3616-3620.derbyshire, k. m., salvo, j. j., and grindley, n. d. f. (1986) . a simple and efficient procedure for saturation mutagenesis using mixed oligodeoxynucleotides.gene 46, 145-152.donahue, t. f., farabaugh, f? j., and fink, g. r. (1982) . fox, t. d., and weiss-brummer, 8. (1980) . leaky +l and -1 frame shift mutations at the same site in a yeast mitochondrial gene. nature 288, 60-63.gafner, j., de robertis, e. m., and philippsen, f? (1963) . delta sequences in the 5' non-coding region of yeast trna genes, embo j. 2, 583-591.garfinkel, d. j., boeke, j. d., and fink, g. r. (1965) . ty element transposition: reverse transcriptase and virus-like particles, cell 42, 507-517.guarente, l., yocum, r., and gifford, t? (1982) . a gallo-cyci hybrid yeast promoter identifies the gal4 regulatory region as an upstream site. proc. natl. acad. sci. usa 79, 7410-7414. f, and hall, 8. d. (1977) . molecular characterization of the tyrosine trna genes in yeast. nature 267, 639-641.olson, m. v., page, g. s., sentenac, a., piper, f! w., worthington, m.. weiss, r. b., and hall, 8. d. (1981) . only one of two closely related yeast suppressor trna genes contains an intervening sequence. nature 291, 464-469. rajbhandary, u. l., chang, j. h., stuart, a., faulkner, b. d., hoskinson, r. m., and khorana, h. g. (1967) . the primary structure of yeast phenylalanine transfer rna. proc. natl. acad. sci. usa 57, 751-758.randerath. e., gupta, r. c., chia, l. l. s. y., chang. s. h., and randerath, k. (1979) . yeast trnabr$ purification, properties, and determination of the nucleotide sequence by radioactive derivative methods. eur. j. biochem. 93, 79-94. sanger, f., nicklen, s.. and coulson, a. r. (1977) key: cord-339093-mwxkvwaz authors: li, wei; schäfer, alexandra; kulkarni, swarali s.; liu, xianglei; martinez, david r.; chen, chuan; sun, zehua; leist, sarah r.; drelich, aleksandra; zhang, liyong; ura, marcin l.; berezuk, alison; chittori, sagar; leopold, karoline; mannar, dhiraj; srivastava, shanti s.; zhu, xing; peterson, eric c.; tseng, chien-te; mellors, john w.; falzarano, darryl; subramaniam, sriram; baric, ralph s.; dimitrov, dimiter s. title: high potency of a bivalent human vh domain in sars-cov-2 animal models date: 2020-09-04 journal: cell doi: 10.1016/j.cell.2020.09.007 sha: doc_id: 339093 cord_uid: mwxkvwaz novel covid-19 therapeutics are urgently needed. we generated a phage-displayed human antibody vh domain library from which we identified a high-affinity vh binder ab8. bivalent vh, vh-fc ab8 bound with high avidity to membrane-associated s glycoprotein and to mutants found in patients. it potently neutralized mouse adapted sars-cov-2 in wild type mice at a dose as low as 2 mg/kg and exhibited high prophylactic and therapeutic efficacy in a hamster model of sars-cov-2 infection, possibly enhanced by its relatively small size. electron microscopy combined with scanning mutagenesis identified ab8 interactions with all three s protomers and showed how ab8 neutralized the virus by directly interfering with ace2 binding. vh-fc ab8 did not aggregate and did not bind to 5300 human membrane-associated proteins. the potent neutralization activity of vh-fc ab8 combined with good developability properties and cross-reactivity to sars-cov-2 mutants provide a strong rationale for its evaluation as a covid-19 therapeutic. the global outbreak of a severe acute respiratory distress (sars) coronavirus 2 (sars-cov-2) associated disease 2019 requires rapid identification of therapeutics and vaccines. while many vaccines are in clinical development, the time to market can be relatively long and immunogenicity can be limited for high-risk groups (amanat and krammer, 2020) . alternatively and complementarily, antibodies can be used as safe and effective prophylactics and therapeutics (pelegrin et al., 2015) . convalescent plasma from covid-19 patients inhibited sars-cov-2 infection and alleviated symptoms of newly infected patients (casadevall and pirofski, 2020; rojas et al., 2020) suggesting that potent neutralizing monoclonal antibodies (mabs) may be even more effective. sars-cov-2 genome shares more than 80% homology to the sars-cov . similar to sars-cov, sars-cov-2 uses the spike (s) envelope glycoprotein to enter into host cells. the viral entry is initiated by the receptor binding domain (rbd) of the s protein binding to its receptor, angiotensin-converting enzyme 2 (ace2), leading to conformational change of the s2 subunit and formation of six helical-bundle resulting in membrane fusion between viral and host cells yan et al., 2020) . the sars-cov rbd contains immune-dominant epitopes that can elicit neutralizing antibodies conferring protection to sars-cov infection (he et al., 2005) . a recent bioinformatics study showed that sars-cov-2 rbd has several b cell epitopes (grifoni et al., 2020) . sars-cov-2 rbd based immunogens were able to elicit neutralizing sera in animals (quinlan et al., 2020) . thus, sars-cov-2 rbd is a good target for developing potent neutralizing mabs. we and others have identified such potent neutralizing human mabs targeting the rbd of sars-cov (zhu et al., 2007) and the middle east respiratory syndrome coronavirus (mers-cov) (ying et al., 2014a) . recently, several groups have reported the isolation of potent neutralizing antibodies from convalescent human donors but all are in an immunoglobulin g1 (igg1) format with a molecular mass of about 150 kda ju et al., 2020; rogers et al., 2020; shi et al., 2020; zost et al., 2020) . antibody domains and fragments such as fab (fragment antigen binding, molecular weight of 50 kda), scfv (singe-chain variable fragment, 30 kda) and v h (heavy chain variable domain, 15 kda) are attractive antibody formats as candidate therapeutics (nelson, 2010) . for example, isotope labeled antibody fragments are more suitable for bio-imaging due to their better tissue penetration and faster clearance compared to full-size antibodies (freise and wu, 2015) . single antibody domains (sabd), e.g., camelid v h h (15 kda) exhibit strong antigen binding and high stability (harmsen and de haard, 2007) . we and others have demonstrated that human igg1 heavy chain variable domain (v h ) can be engineered to achieve high stability and affinity to antigens (nilvebrant et al., 2016) , as exemplified by the v h , m36.4, targeting the human immunodeficiency virus type 1 (hiv-1) envelope glycoprotein co-receptor binding site (chen et al., 2008a) . the v h domains small size could improve therapeutic efficacy for infectious diseases, such as covid-19 because of greater penetration to sites of infection. the conformation of the sars-cov-2 s trimer is dynamic with only one rbd in the "up" conformation presenting neutralizing epitopes while epitopes in the other two rbds may be masked . small v h s may achieve binding to the cryptic rbd epitopes during the dynamic "breathing" of the s trimer . in addition, v h s may have j o u r n a l p r e -p r o o f an advantage for treatment of respiratory virus infections because v h s could efficiently penetrate tissue, especially when using direct delivery through inhalation (detalle et al., 2016) . to identify potent neutralizing v h s against sars-cov-2, we panned our large (10 11 clones) and diverse phage-displayed human v h antibody library against recombinant rbd. several v h binders were isolated and screened for their affinities, ace2 competition and stabilities. one of those v h s, ab8, in an fc (human igg1, crystallizable fragment) fusion format, showed potent neutralization activity and specificity against sars-cov-2 both in vitro and in two animal models. to our knowledge, this is the first report for high potency of a human antibody domain (v h ) in two animal models of infection. we generated a large phage-displayed human v h library where heavy chain complementarity-determining regions (hcdr1, 2, 3s) were grafted into their cognate positions of a stable scaffold based on the germline v h 3-23 ( figure s1a) . it was panned against recombinant rbd antigens with two different tags (avi-his and human igg1 fc tag) which were sequentially used to avoid phage enrichment to tags and related epitopes. the quality of the rbd used for panning was confirmed by ace2 binding (figure s1b and c). after three rounds of panning, a panel of v h binders was obtained. among the highest affinity binders, we selected one, v h ab8, which did not aggregate during a six-days incubation at 37°c as tested by dynamic light scattering (dls) (figure s1d ). to increase the v h ab8 avidity and extend its in vivo half-life, it was converted to a bivalent antibody domain by fusion to the human igg1 fc (v h -fc ab8) (figure s1e ). v h ab8 bound to sars-cov-2 rbd and s1 with half-maximal binding concentrations (ec 50 s) of 10 nm as measured by elisa (figure 1a and d) and an equilibrium dissociation constant (k d ) of 19 nm as measured by the biolayer interferometry (blitz system) ( figure 1b) . the relatively fast dissociation rate constant (k d = 4.1× 10 -3 s -1 ) was significantly (23-fold) decreased by the conversion to a bivalent fc fusion format (k d = 1.8× 10 -4 s -1 ) ( figure 1e ) resulting in high avidity. v h -fc ab8 bound to sars-cov-2 rbd and s1 subunit of s protein with ec 50 s of 0.40 nm and 0.20 nm, respectively, and a k d of 0.54 nm ( figure 1e ). it specifically bound to 293t cells expressing s, but not to control 293t cells ( figure 1c and figure s2a ). the binding of v h -fc ab8 was higher than that of igg1 cr3022, an anti-sars-cov antibody cross-reactive with sars-cov-2 (tian et al., 2020) . the v h -fc ab8's halfmaximal facs measured binding concentration (fc 50 ) of 0.07 nm was higher than that of recombinant human ace2-fc (fc 50 = 0.52 nm) ( figure 1f ). these data demonstrate that ab8 selected by an isolated rbd can bind to cell surface associated native s trimer. the binding of v h -fc ab8 to the s protein was significantly improved compared to that of the v h ab8 through avidity effect. competition with human ace2 for binding to rbd is a surrogate indicator for antibody neutralization activity. v h -fc ab8 outcompeted human ace2-fc with a half-maximal inhibitory concentration (ic 50 ) of 1.0 nm ( figure 2a ). note that the v h -fc ab8 was much more effective in outcompeting ace2-fc than v h ab8, consistent with its enhanced binding. ace2 can also block v h ab8 for binding to rbd ( figure s2b ) and cell surface associated s ( figure s2c) . v h -fc ab8 also significantly decreased the kinetics of ace2 binding as measured by blitz ( figure 2b ). v h -fc ab8 did not bind to the sars-cov rbd ( figure 2c ) and did not compete with cr3022 for binding to rbd ( figure 2d ). the cr3022 epitope is located in a conserved region on the rbd core domain distal from the ace2 binding interface, as seen in the crystal structure of the fab cr3022-rbd complex . these results indicate that the ab8 epitope may overlap with the ace2 binding site on rbd. currently, nine prevalent rbd mutants were found in covid-19 patients (priyanka et al., 2020) . six of these mutations (f342l, n354d. n354d/d364y, v367f, r408i, w436r) are located in the rbd core domain and three, k458r, g476s and v483a are in the receptor binding motif (rbm) (figure 3a) . v h -fc ab8 bound to all mutants similarly to wild type rbd as measured by elisa ( figure 3b ). to map the ab8 epitope, we also generated several mutations in non-conserved positions compared to sars-cov spanning the footprint of ace2 on rbm (n439a, g446l, l455a, f456a, a475i, f486a, q493a, q498a, n501a, y505a) ( figure 3c ). most of these mutants retained v h -fc ab8 binding except f486a, f456a and a475i ( figure 3d and 3e) . the f486a significantly decreased binding without affecting the overall rbd conformation (figure s2c and s2d) indicating that f486 directly interacts with ab8. the f456a and a475i mutations decreased the binding by 15% and 40%, respectively, but they also affected the rbd conformation ( figure s2c and s2d) . these results suggest that a portion of the v h ab8 epitope could be in the rbm distal loop tip where the f486 is located at ( figure 3f ). to explore structural aspects of sars-cov-2 neutralization by v h ab8, we performed negative stain electron microscopic analysis of the complex formed between the s protein ectodomain and v h ab8 or soluble ace2 (figure 4) . the density maps showed that both v h ab8 and ace2 were in a quaternary conformation in which two of the protomers in the trimer are in the "down" conformation with the third one in the "up" conformation ( figures 4a and 4b) , similar to the quaternary conformation of the reported ace2-bound s ectodomain (pdb id: 6vyb) (walls et al., 2020) . one molecule of the v h ab8 was observed bound to each rbd domain ( figure 4a ). in the ace2-s complex, one molecule of ace2 was bound to the s protein trimer, straddling one "up" and one "down" rbd region ( figure 4b ). there appears to be a noticeable shift of the "up" rbd domain when it is bound to v h ab8 ( figure 4a ). this shift is not observed when ace2 is bound to the trimer ( figure 4b) . superposition of the two density maps reveals that the binding site of v h ab8 directly overlaps with the ace2 one, precluding simultaneous occupancy on the s protein ectodomain ( figure 4c ). we also found that when ace2 was added subsequent to the j o u r n a l p r e -p r o o f addition of v h ab8, only the v h ab8 bound state was observed, further confirming the ace2 competition with v h ab8. to better understand the spatial relationship between the site of v h ab8 binding and that of ace2 binding, we created a molecular model for ace2 bound s trimer by aligning the rbd region of the crystal structure of sars-cov-2 rbd bound ace2 (pdb id: 6m0j) (lan et al., 2020) to the "up" rbd region in the cryo-em structure of the trimer (pdb id: 6yvb) (wrapp et al., 2020) . superposition of this chimeric structure with the density map of v h ab8-bound s protein trimers reveals that the bound ace2 has extensive overlap with the space occupied by bound v h ab8 ( figure 4d ). the direct spatial overlap between bound v h ab8 and ace2 provides a structural mechanism for the observed effect of ab8 on blocking ace2 binding. the structural findings also showed that the rbm distal loop, which has f486 at its tip, is directly covered by the footprint of the bound v h ab8, consistent with the epitope mapping results showing that f486 is a direct contacting residue for ab8. we used four different assays to evaluate v h -fc ab8 mediated inhibition of sars-cov-2 infection in vitro: a βgalactosidase (β-gal) reporter gene-based quantitative cell-cell fusion assay (xiao et al., 2003) ; an hiv-1 backbonebased sars-cov-2 pseudovirus assay ; and two different replication-competent virus neutralization assays (a luciferase reporter gene assay and a microneutralization (mn)-based assay) (scobey et al., 2013; yount et al., 2003) . v h -fc ab8 inhibited cell-cell fusion much more potently than v h ab8 ( figure 5a ). the inhibitory activity of v h -fc ab8 was also higher than that of ace2-fc. the control anti mers-cov antibody igg1 m336 did not show any inhibitory activity. v h -fc ab8 neutralized pseudotyped sars-cov-2 virus (ic 50 = 0.03 µg/ml) more potently than ace2-fc (ic 50 = 0.40 µg/ml) and v h ab8 (ic 50 = 0.65 µg/ml) ( figure 5b ). the pseudovirus neutralization ic 50 for ace2-fc in our assay is comparable to the one reported by changhai lei et al. (0.03-0.1 µg/ml) (lei et al., 2020) . interestingly, the maximum neutralization by v h ab8 was only 50% compared to the 100% by v h -fc ab8 and ace2-fc, which was also observed for another antibody s309 (pinto et al., 2020) . the complete neutralization by v h -fc ab8/ace2-fc emphasizes the role of bivalency and related avidity in neutralization (klasse and sattentau, 2002) . furthermore, in the reporter gene assay v h -fc ab8 neutralized live sars-cov-2 with an ic 50 of 0.04 µg/ml ( figure 5c ), which is much lower than that for ace2-fc (ic 50 of 6.1 µg/ml) and v h ab8 (ic 50 = 29 µg/ml). ace2-fc seemed to be much less potent against the live virus compared to the pseudovirus, which is also observed by others (ic 50 = 12.6 µg/ml) and may relate to the s expression levels and rbd/s conformation on the virus surface. we also confirmed the high v h -fc ab8 live virus neutralization potency by a microneutralization (mn) assay-100% neutralization (nt 100 ) at 0.1 µg/ml ( figure 5d ). the nt 100 from the mn assay (0.1 µg/ml) was close to the ic 100 (0.2 µg/ml) from the reporter gene assay suggesting consistency in the live virus neutralizing activity of v h -fc ab8 obtained with two independent assays at two different laboratories. these results suggest that v h -fc ab8 is a potent neutralizer of sars-cov-2, which correlates with its strong competition with ace2 for binding to rbd. to evaluate the prophylactic efficacy of v h -fc ab8 in vivo, we used a recently developed mouse ace2 adapted sars-cov-2 infection model, in which wild type balb/c mice are challenged with sars-cov-2 carrying two j o u r n a l p r e -p r o o f mutations q498t/p499y at the ace2 binding interface in the rbd . it was shown that in this model, the aged balb/c mice exhibited more clinically relevant phenotypes than those seen in hace2 transgenic mice . groups of 5 mice each were administered 36, 8, 2 mg/kg v h -fc ab8 prior to high titer (10 5 pfu) sars-cov-2 challenge followed by measurement of virus titer in lung tissue 2 days post infection. v h -fc ab8 effectively inhibited sars-cov-2 in the mouse lung tissue in a dose dependent manner ( figure 6a ). there was complete neutralization of infectious virus at the highest dose of 36 mg/kg, and statistically significant reduction by 1000-fold at 8 mg/kg. remarkably, even at the lowest dose of 2 mg/kg it significantly decreased virus titer by 10fold (two tailed, unpaired t test, p = 0.0075). to exclude possible effects of residual ab8 on viral titration, we performed another experiment in which mouse lungs were perfused with 10 ml of pbs before harvesting for titration. the perfusion did not affect to any significant degree the infectious virus in the lungs ( figure 6b ). the v h -fc ab8 completely neutralized the virus in the lungs at 36 mg/kg and significantly reduced infectious virus at 8 mg/kg. v h -fc ab8 also reduced viral rna in the lungs ( figure 6c ). these results demonstrate the neutralization potency of v h -fc ab8 in vivo. they also suggest that the double mutations q498t/p499y on rbd did not influence v h -fc ab8 binding and contribute to the validation of the mouse adapted sars-cov-2 model for evaluation of neutralizing antibody efficacy. recently hamsters were demonstrated to recapitulate clinical features of sars-cov-2 infection (chan et al., 2020) (imai et al., 2020) . to evaluate the v h -fc ab8 efficacy in hamsters, it was intraperitoneally administered either 24 hours before (prophylaxis) or 6 hours after (therapy) intranasal 10 5 tcid 50 virus challenge. in the therapeutic group, the rationale for administration of the antibody six hours post viral infection is based on the replication cycle length of 5-6 hours after initial infection for sars-cov in veroe6 cells (keyaerts et al., 2005) . six hours after challenge with a high dose of 10 5 tcid 50 , approximately the same number of susceptible cells could become infected and likely produce much more infectious virus, which would need to be neutralized by the antibody to prevent subsequent cycles of infection. nasal washes and oral swab at 1, 3, 5 days post infection (dpi) and different lung lobes at 5 dpi were collected. v h -fc ab8 decreased viral rna by 1.7 log in the lung when administered prophylactically. the lung viral rna decrease in the therapeutic groups was slightly lower (by 1.2 log) ( figure 6d) . interestingly, the viral rna load in the therapeutic groups was to some extent tissue location dependent ( figure 6f ). the variation of the viral load in different lung lobes may relate to nonuniform antibody transport and viral spread inside the lung. remarkably, v h -fc ab8 alleviated hamster pneumonia and reduced the viral antigen in the lung (h&e staining, figure 7a and c and immunohistochemistry figure 7b and d). the control hamsters exhibited severe interstitial pneumonia characterized by extensive inflammatory cell infiltration, presence of type ii pneumocytes, alveolar septal thickening and alveolar hemorrhage. both prophylactic and therapeutic treatment of v h -fc ab8 reduced the lesions of alveolar epithelial cells, focal hemorrhage and inflammatory cells infiltration. v h -fc ab8 also reduced the shedding from mucosal membranes including in nasal washes and oral swabs ( figure s4 ). the decrease in viral rna in nasal washes and oral swabs were not as large as the decrease observed in the lung tissue, similar to a recent finding in hamsters (imai et al., 2020) . overall, the j o u r n a l p r e -p r o o f prophylactic treatment was more effective than the therapeutic treatment in decreasing viral load in nasal washes and oral swabs. notably, prophylactic administration of v h -fc ab8 effectively reduced the infectious virus in the oral swab at 1 dpi, while the post-exposure treatment did not (figure s4c and g) . interestingly, viral reduction (except the viral titer in the oral swab at 1 dpi) was more effective at 3 and 5 dpi compared to that at 1 dpi, likely due to the infection peak occurring before day 3 as reported in hamsters (sia et al., 2020) . a striking finding is that v h -fc ab8 given therapeutically at as low dose as 3 mg/kg can still decrease viral loads in the lung, nasal washes and oral swabs ( figure s5 ). we measured the v h -fc ab8 concentrations at both doses (10 and 3 mg/kg) in the sera at 1 dpi and 5 dpi in the post-exposure treatment groups ( figure s5c ). the higher dose (10 mg/kg) resulted in higher antibody concentration and better inhibitory activity than the lower dose (3 mg/kg). the relatively high concentration of v h -fc ab8 five days after administration also indicates good pharmacokinetics. furthermore, we also compared the v h -fc ab8 concentration in both the sera and lung with that of igg1 ab1, which has a similar affinity to sars-cov-2 and similar degree of competition with the receptor ace2 as v h -fc ab8 . we found that the concentration of v h -fc ab8 in hamster sera is significantly higher than that of igg1 ab1 at 1 and 5 dpi after postexposure administration of the same dose of 10 mg/kg ( figure 7e ), possibly indicating more effective delivery of v h -fc ab8 from the peritoneal cavity to the blood than that of igg1 ab1. we also found that the v h -fc ab8 concentration in all hamster lung lobes was higher than that of the igg1 ab1 ( figure 7f ), suggesting that v h -fc ab8 appears to penetrate the lung tissue more effectively than igg1 ab1. these results indicate that the in vivo delivery of v h -fc ab8 may be more effective than that of full-size antibodies in an igg1 format. the v h -fc ab8 propensity for aggregation was measured at 37°c by dynamic light scattering (dls), which detects particle size distributions in the nanometer range (stetefeld et al., 2016) . it displayed a single peak at 11.5 nm which is the size of a monomeric v h -fc protein ( figure s6a ). the absence of large-size peaks corresponding to large molecular weight species (aggregates) in solution, indicates that v h -fc ab8 is highly resistant to aggregation at high concentration (4 mg/ml) and relatively long times of incubation (6 days) at 37°c. the v h -fc ab8 propensity for aggregation was also evaluated by size exclusion chromatography (sec), which showed that >96% of v h -fc ab8 was eluted in a peak at a position corresponding to a monomeric state with a molecular weight of 80 kda ( figure s6b ). antibody nonspecificity and polyreactivity can be an obstacle for developing an antibody into a clinically useful therapeutic. polyreactivity may not only cause off-target toxicities and interfere with normal cellular functions, but may also reduce antibody half-life (chuang et al., 2015) . to test for potential polyreactivity of v h -fc ab8, a membrane proteome array (mpa) platform was used, in which 5,300 different human membrane protein clones were separately overexpressed in 293t cells in a matrix array achieving a high-throughput detection of binding by facs. v h -fc ab8 did not bind to any of those proteins ( figure s6c ), demonstrating its lack of polyreactivity and nonspecificity. interestingly, we did not detect v h -fc ab8 binding to the human fcγria, which is probably due to the relatively low expression level of fcγria on hek-293t cell surface without concomitant expression of the common γ chain (van vugt et al., 1996) . in addition, we found that v h -fc ab8 bound to the fcγrs much weaker than igg1 (figure s7 ), likely due to the different conformation in the lower hinge region for fc fusion proteins compared to that of igg1s (ying et al., 2014b) . for the fc fusion proteins (even with the same hinge sequence as igg1), binding to fcγrs may be different from that of igg1, and can be affected by the fusion partners (lagassé et al., 2019) . the importance of antibody binding to fcγrs for therapeutic or prophylactic efficacy or toxicity in sars-cov-2 infection is unknown. neutralizing mabs are promising for prophylaxis and therapy of sars-cov-2 infections. recently, many potent neutralizing antibodies from covid19 patients were identified that neutralize pseudovirus with ic 50 s ranging from 1 to 300 ng/ml, and replication-competent sars-cov-2 with ic 50 s from 15 to 500 ng/ml ju et al., 2020; rogers et al., 2020; shi et al., 2020; zost et al., 2020) . by comparison, the v h -fcab8 reported here exhibited comparable or better neutralizing potency against sars-cov-2 pseudovirus and live virus (ic 50 s of 30 ng/ml and 40 ng/ml respectively). of note, ic 50 s can vary widely between different assays and laboratories because there is no generally accepted standardized assay. in addition, there are many factors that contribute to potency and efficacy in vivo. animal models are a more comprehensive and likely more reliable predictor of potential efficacy in humans than in vitro neutralization assays. to our knowledge v h -fc ab8 is the first human antibody domain whose activity was validated in two animal models. in the mouse ace2 adapted sars-cov-2 infection model, v h -fc ab8 significantly decreased infectious virus by 10-fold at 2 days post infection even at a very low dose of 2 mg/kg ( figure 6a ). it also exhibited both prophylactic and therapeutic efficacy in a hamster model. it not only reduced the viral load in the lung and alleviated pneumonia; but it also reduced shedding in the upper airway (nasal washes and oral swab), which could potentially reduce transmission of sars-cov-2. impressively, v h -fc ab8 was active therapeutically even at 3 mg/kg. the finding that v h -fc ab8 persisted for 4 days post administration at significant levels indicates that the pharmacokinetics of v h -fc ab8 is comparable to that of a full size antibody; the half-lives of fc fusion proteins were reported to vary from those of igg1s and can range from hours to days (unverdorben et al., 2016) . the molecular weight of v h -fc ab8 (80 kda) is half of that of full-size igg1 which suggests an advantage in terms of smaller quantities needed to be produced compared to those for igg1s to reach similar number of molecules and efficacy. in addition, it was shown that decreasing binder's size exponentially increases its diffusion through normal and tumor tissues (jain, 1990) . thus, decreasing the size two-fold can increase diffusion through tissues by four-fold. we found that after administration at the same dose, the concentration of v h -fc ab8 was higher than that of igg1 ab1 in both hamster sera and lung tissue. this result might suggest that the v h -fc ab8 diffusion from the peritoneal cavity to the blood and penetration of lung may be faster than that of igg1 ab1. this may further explain its efficacy at low doses in animals. although the low dose showed efficacy in the small animal models, it should be noted that in humans higher doses could be required to achieve comparable degree of efficacy. another caveat is that in the j o u r n a l p r e -p r o o f hamster post-exposure experiment, the v h -fc ab8 was administered at a time (six hours) when the first round of virus replication was likely completed (keyaerts et al., 2005) , but before the infection peak at 1-2 days (sia et al., 2020) . because it inhibits infection of new cells, its administration at around the infection peak or after may not be as effective unless it also kills infected cells in vivo which is under investigation. recently antibody domains including human v h and camelid v h h were reported having varying neutralization potency (chi et al., 2020; sun et al., 2020; wrapp et al., 2020; wu et al., 2020a) . compared to those domains, v h -fc ab8 is unique in terms of potency, aggregation resistance and specificity. v h -fc ab8 exhibited good developability properties including stability at high concentrations and long incubation at 37°c, as well as absence or very low aggregation. in addition, v h -fc ab8 did not bind to the human cell line 293t even at high concentration (1 µm) which is about 1754-fold higher than its k d indicating absence of non-specific binding to many membraneassociated human proteins. a similar result was obtained by the membrane protein array assay showing that v h -fc ab8 did not bind to any of 5,300 human membrane-associated proteins, indicating its lack of non-specificity and thus low potential for off-target toxicity when used in vivo. besides, unlike camel v h hs, the v h ab8 sequence is fully human and therefore likely less immunogenic than that of camelid v h hs. multiple structures are now available for the sars-cov-2 s protein trimer in complex with various neutralizing antibodies, offering insight into antigenic epitopes and inhibitory mechanisms critical for s protein neutralization. epitopes on the sars-cov-2 s protein rbd have emerged as effective targets, as evidenced by the action of several rbd binding antibodies including cr3022, b38, c105, cb6, h014, and s309 (barnes et al., 2020; lv et al., 2020; pinto et al., 2020; shi et al., 2020; wu et al., 2020b) . while b38, c105, and cb6 directly compete with ace2 for binding sites on the rbd surface, h014 occupies a position distinct from these binding sites, precluding ace2 binding via steric inhibition . s309 targets the rbd of the s protein both in closed and open s protein conformations, exhibiting a different mechanism of neutralization (pinto et al., 2020) . a recent study of the structure of the s protein trimer in complex with the nanobody h11-d4 (pdb id: 6z43) revealed full occupancy of the nanobody on all three rbds in a "one up and two down" conformation (huo et al., 2020) , similar to what we report here. our structural analysis demonstrates that the location of the v h ab8 bound to the trimeric s ectodomain directly overlaps the region that would be occupied by ace2 when bound to the s protein. the ace2 blocking is likely the major mechanism of the v h -fc ab8 neutralizing activity, which is significantly augmented by avidity effects due to its bivalency. the narrow neutralization concentration range in the live virus neutralization (10-200 ng/ml for 0%-100% neutralization) ( figure 5d ) indicates a plausible cooperative neutralization mechanism, probably due to the synergistic binding of v h molecules in v h -fc ab8 to rbds. due to its small size, v h may facilitate targeting occluded epitopes on rbd that are otherwise inaccessible to full-length iggs, which is important because the sars-cov-2 s protein is conformationally heterogenous, exposing neutralizing epitopes to varying degrees . the structural analysis shows that v h ab8 is able to simultaneously target all three rbd epitopes in both "up" and "down" conformations, which may provide a structural basis for a unique cooperative neutralization mechanism for v h -fc ab8. v h -fc ab8 with a long flexible linker between v h and fc may allow two j o u r n a l p r e -p r o o f v h molecules to bind simultaneously two protomers in the same s trimer or cross-link two different protomers from different s trimers. the ab8 epitope is distal to the cr3022 epitope, explaining its lack of competition with cr3022. the ab8 contact residue f486 (l472 in sars-cov) is not conserved which likely explains its lack of cross-reactivity to sars-cov. from the gisaid and ncbi databases, we found nine mutations in rbd with relatively high frequencies in current circulating sars-cov-2. six of them are in the core domain (f342l, n354d, n354d/d364y, v367f, r408i and w436r) and three in the rbm (k458r, g476s, v483a). the core domain mutations are far away from the ab8 epitope, thus these mutations do not affect v h -fc ab8 binding to rbd. those three rbm mutations also did not affect ab8 binding although they are close to the ab8 epitope, suggesting that these mutations may not affect ab8 neutralizing activity although neutralization of whole virus carrying these mutations is needed to definitely demonstrate this possibility. interestingly, v h -fc ab8 effectively inhibited the mouse ace2 adapted sars-cov-2 with a q498t/p499y mutation in rbd, indicating that this double mutation also does not affect v h -fc ab8 binding to rbd. these results suggest that v h -fc ab8 may be a broadly crossreactive sars-cov-2 neutralizing antibody. in conclusion, we identified a fully human antibody v h domain that shows strong competition with ace2 for binding to rbd and potent neutralization of sars-cov-2 in vitro and in two animal models. this potent neutralizing activity combined with its specificity and good developability properties warrants its further evaluation for prophylaxis and therapy of sars-cov-2 infection. our elucidation of its unique epitope and mechanism of neutralization could also help in the discovery of more potent inhibitors and vaccines. hamsters were bled at one and five dpi for measuring antibody concentrations in sera by sars-cov-2 s1 elisa. sera was diluted 1:100 and binding was detected by using the goat anti human igg-hrp. (f). viral rna levels in different lung lobes. rna quantity was presented as the tcid 50 equivalence. experiments were performed in duplicate and the error bars denote ± sd, n =2. detailed methods are provided in the online version of this paper and include the following: • key resources table further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, dimiter dimitrov (mit666666@pitt.edu). all requests for resources and reagents should be directed to and will be fulfilled by the lead contact author. this includes antibodies, viruses, plasmids and proteins. all reagents will be made available on request after completion of a material transfer agreement. antibody nucleotide sequence has been deposited to genbank with an accession number of mt943599. the antibody is only allowed for non-commercial use. all data supporting the findings of this study are available within the paper and are available from the corresponding author upon request. vero e6 (crl-1586, american type culture collection (atcc) and 293t (atcc) were cultured at 37°c in dulbecco's modified eagle medium (dmem) supplemented with 10% fetal bovine serum (fbs), 10 mm hepes ph 7.3, 1 mm sodium pyruvate, and 100 u/ml of penicillin-streptomycin. 293t stably expressing sars-cov-2 and human ace2 was cultured in dmem medium containing 200 µg/ml zeocin. hek293f and expi293f were cultured in freestyle 293 serum free medium (thermofisher, cat#12338018) and expi293™ expression medium j o u r n a l p r e -p r o o f (thermofisher, cat# a1435103), respectively. the sars-cov-2 spike pseudotyped hiv-1 backboned virus is packaged in 293t cells after transfecting pnl4-3.luc.re and pcdna3.1 s plasmids. the sars-cov-2 (us_wa-1/2020) and sars-cov2/canada/on/vido-01/2020 obtained from centers for disease control and prevention were propagated in vero e6 cells. the recombinant sars-cov-2-seattlenluc virus and the mouse ace2 adapted sar-cov-2 virus (carrying a q498t/p499y mutation in rbd) recovered by the reverse genetics was produced in veroe6 cells. all work with infectious sars-cov-2 was performed in institutional biosafety committee approved bsl3 facilities using appropriate positive pressure air respirators and protective equipment. the recombinant proteins sars-cov-2 rbd-his, rbd mutants, rbd-fc, ace2-hfc were subcloned into pcdna3.1 expression plasmids, and expressed in expi293f cells. proteins with his tag were purified by ni-nta affinity chromatography and protein with fc tag purified by protein a chromatography. protein purity was estimated as >95% by sds-page and protein concentration was measured spectrophotometrically (nanovue, ge healthcare). v h ab8 antibody was identified by panning of the phage library. v h -fc ab8 were constructed by fusing v h to human igg1 fc with the native igg1 hinge. igg1 ab1 was obtained by our lab through panning of a fab phage library. mers-cov-specific igg1 m336 and sars-cov antibody igg1 cr3022 sequences from other groups were subcloned into the pdr12 plasmid for expression. v h ab8 (in a phagemid pcomb3x with a flag tag) was expressed in hb2151 e. coli and purified by ni-nta affinity chromatography. all other igg1 were expressed in expi293 cells and purified with protein a chromatography. for the mouse model, balb/c mice purchased from envigo (balb/cannhsd, stock# 047, immunocompetent, 11-12 months of age, female) were used for all experiments. they are drug/test naïve and negative for pathogens. biofresh bedding with crinkle bedding added. hamsters have access to food and water ab libitum. food is lab diet 5p00 prolab rmh300. cages are changed weekly or as needed and spot cleaned. for experiment, hamsters were intraperitoneally treated with v h -fc ab8 either 24 hrs before (prophylaxis) or 6 hrs (therapy) after intranasal challenge of 1×10 5 tcid 50 of sars-cov-2. nasal washes and oral swabs were collected at day 1, 3 and 5 post infection (dpi). hamsters were bled at 1 and 5 dpi. all hamsters were euthanized on 5 dpi. at euthanasia, lungs were collected for rna isolation. for viral titer determination, veroe6 cells tcid 50 assay was used. for testing viral rna, viral rna rt-qpcr was used. for testing antibody concentration at sera and lung, sars-cov-2 s1 elisa was used. for histopathology, 10% formalin fixed and paraffin embedded tissues were processed with either hematoxylin and eosin stain (h&e) or immunohistochemistry (ihc). lung lobes were scored based on pathology using microscopy. cr3022. the sars-cov-2 s and the anti-sars-cov antibody igg1 cr3022 and genes were synthesized by idt (coralville, iowa). mers-cov-specific igg1 m336 antibody was expressed in human mammalian cell as described previously (ying et al., 2014a) . briefly, igg1 m336 light chain and heavy chain fd were subcloned into the pdr12 vector containing dual promoters and a igg1 fc cassette. the recombinant plasmid was sequenced and transfected into expi293 cells for expression. the human angiotensin converting enzyme 2 (ace2) gene was ordered from origene (rockville, md). the rbd domain (residues 330-532) and s1 domain (residues 14-675) and ace2 (residues 18-740) genes were cloned in frame to human igg1 fc in the mammalian cell expression plasmid pcdna3.1. the rbd protein with an avitag followed by a 6×his tag at c-terminal was subcloned similarly. these proteins were expressed with expi293 expression system (thermo fisher scientific) and purified with protein a resin (genscript) and by nickel-nitrilotriacetic acid (ni-nta) resin (thermo fisher scientific). the fab cr3022 antibody gene with a his tag was cloned into pcat2 plasmid (developed in house) for expression in hb2151 bacteria and purified with ni-nta resin. protein purity was estimated as >95% by sodium dodecyl sulfatepolyacrylamide gel electrophoresis (sds-page) and protein concentration was measured spectrophotometrically (nanovue, ge healthcare). unlike camel v h hs, which naturally evolved to be autonomously stable, human v h is usually unstable and easy to aggregate in the absence of v l (li et al., 2016; nguyen et al., 2000) . however, human v h can be selected or engineered with high stability and solubility. to facilitate identification of stable v h binders, we chose engineered germline v h 3-23 as our library scaffold (chen et al., 2008b) . our human v h phage display library was made by grafting heavy chain cdr1, 2, 3 genes derived from 12 healthy donors' peripheral blood monocytes (pbmcs) and j o u r n a l p r e -p r o o f splenocytes (takara, cat. no. 636525) into their cognate positions of a stable scaffold (based on the germline v h 3-23) in a manner similar to the method we previously described but without mutagenesis of cdr1 (chen et al., 2008a) . briefly, cdrs were pcr-amplified by using primers with degenerated adaptors covering cdrs edge regions from diverse v h families in one end, and with sequences annealing to the v h 3-23 framework (fr) regions in the other end. the pcr products were then assembled by overlapping extension pcr by using primers with homologous ending. the whole v h was assembled by overlapping fr1-cdr1-fr2-cdr2 and fr3-cdr3-fr4 fragments. after assembly, the v h fragment was sfi i digested followed by ligated into sfi i linearized pcomb3x phagemid. the recombinant phagemid was then purified, desalted and concentrated for electroporation of bacteria tg1, from which the v h phage particles were rescued and produced. the library size was determined by tittering transformants. the library quality (diversity) was checked by randomly sanger sequencing hundreds of v h clones and also evaluated by panning of diverse antigens. this library contains very large number of clones (10 11 ). for panning, the v h library was alternatively panned against biotinylated rbd-his and rbd-fc proteins. rbd biotinylation occurred through biotin ligase (bira) mediated enzymatic conjugation of a single biotin on avitag (glndifeaqkiewhe) (fairhead and howarth, 2015) . the panning was for 3 rounds with input antigens of 10 µg rbd-his, 2 µg rbd-fc and 0.5 µg rbd-his for the 1 st , 2 nd and 3 rd round, respectively. the panning process begun with incubation of antigens with 10 12 v h phage particles followed by washing with phosphate-buffered saline (pbs) containing 0.1% tween-20. bound phage pulled down by streptavidin-m280-dynabeads were rescued by log-phase tg1 cells with the m13ko7 helper phage. after the 3 rd round panning, positive clones were selected by soluble expression monoclonal (sem) elisa followed by sequencing (chen et al., 2008b) . v h binders were further screened for their binding affinity, stability and ace2 competition. for conversion to fc-fusion, the v h gene was subcloned into psectag b vector containing human igg1 fc fragment. v h -fc ab8 was expressed as described above. enzyme-linked immunosorbent assays (elisas). for detection of rbd biotinylation efficacy, horseradish peroxidase (hrp) conjugated streptavidin was used. for conformation of function of rbd-his after biotinylation, 100 ng ace2-fc was coated into the plates followed by addition of serially diluted biotinylated rbd-his. hrp conjugated streptavidin was used for detection. for other elisas, the sars-cov-2 rbd (residues 330-532) protein was coated on 96-well plates (costar) at 100 ng/well in pbs overnight at 4 o c. for screening sem elisa, clones randomly picked from the infected tg1 cells were incubated with immobilized antigen. bound phages were detected with hrp-conjugated mouse anti-flag tag ab (sigma-aldrich). for the v h -fc binding assay, hrpconjugated goat anti-human igg fc (sigma-aldrich) was used for detection. for the competition elisa with hace2, 2 nm of human ace2-mouse fc was incubated with serially diluted v h , or v h -fc, and the mixtures were added to rbd coated wells. after washing, bound ace2-mouse fc was detected by hrp-conjugated anti mouse igg (fc specific) (sigma-aldrich). for evaluation of ace2 blocking of v h ab8 binding to rbd, 10 nm v h ab8 was incubated with coated rbd in the presence of various concentration of ace2-his (sino biological), and the bound v h ab8 was detected by hrp conjugated anti flag antibody. for evaluation of conformational changes of the epitope mapping rbd mutants, we used a mouse polyclonal anti sars-cov-2 rbd antibody (sino biological, cat. no. 40592-mp01) and the human igg1 cr3022 antibody. for measuring the binding of v h -fc ab8 to rbd mutants, 100 ng rbd mutant was coated on 96-wells plates and incubated with v h -fc ab8 with binding detected by using j o u r n a l p r e -p r o o f hrp conjugated anti human fc antibody. to evaluate the binding of v h -fc ab8 and igg1 ab1 to human fcγrs, recombinant human fcγria, iia, iiia were coated on 96-wells plates followed by addition of biotinylated v h -fc ab8 and igg1 ab1. binding was detected by the streptavidin-hrp. all colors were developed by 3,3′,5,5′tetramethylbenzidine (tmb, sigma) and stopped by 1 m h 2 so 4 followed by recording absorbance at 450 nm. experiments were performed in duplicate and the error bars denote ± 1 sd. blitz. antibody affinities and avidities were analyzed by the biolayer interferometry blitz (fortebio, menlo park, ca). for measuring v h ab8 affinity, the rbd-fc was mounted on the protein a sensor (fortebio: 18-5010). 125 nm, 250 nm and 500 nm v h ab8 were used for association. for measuring avidity of v h -fc ab8, biotinylated rbd-fc was immobilized on streptavidin biosensors (fortebio: 18-5019) for 2 min and equilibrated with dulbecco's phosphate-buffered saline (dpbs) (ph = 7.4) to establish baselines. 50 nm, 100 nm and 200 nm v h -fc ab8 were chosen for association. the association was monitored for 2 min and then the antibody was allowed to dissociate in dpbs for 4 min. the k a and k d were derived from sensorgrams fittings and used for k d calculation. for the competitive blitz, 500 nm v h -fc ab8 was loaded onto the rbd-fc coated sensor for 300 s to reach saturation followed by dipping the sensor into a 100 nm ace2-fc or fab cr3022 solution in the presence of 500 nm v h -fc ab8. the association was monitored for 300 s. the signals from100 nm hace2 or cr3022 binding to the rbd-fc coated sensor in the absence of v h -fc ab8 was independently recorded in parallel. competition was determined by the percentage of signal in the presence of v h -fc ab8 to signal in the absence of v h -fc ab8 (< 0.7 is considered to be competitive) (wu et al., 2020a) . (agilent, cat. no. 200521) . mutants were expressed and purified according to the abovementioned rbd purification procedures. elisa was used to evaluate the binding of these mutants compared to the wild type rbd. a. expression and purification. the codon optimized sars-cov-2 2p s protein ectodomain construct (genbank: yp_009724390.1) was c-terminally tagged with 8xhis and a twin strep tag and cloned into the mammalian expression vector pcdna 3.1 (synbio). hek293f cells were grown in suspension culture using freestyle media (thermofisher) at 37 °c in a humidified co 2 incubator (8% co 2 ). cells were transiently transfected at a density of 1 x 10 6 cells/ml using branched polyethylenimine (pei) (sigma) (portolano et al., 2014) . media was exchanged after 24 h and supplemented with 2.2 mm valproic acid. supernatant was harvested by centrifugation after 4 days, filtered and loaded onto a 5 ml histrap hp column (cytiva). the column was washed with buffer (20 mm tris ph 8.0, 500 mm nacl, 20 mm imidazole) and the protein was eluted with buffer (20 mm tris ph 8.0, 500 mm nacl, 500 mm imidazole). purified protein was concentrated (amicon ultra 100 kda cut off, millipore sigma) and loaded onto a j o u r n a l p r e -p r o o f superose 6 column (cytiva) equilibrated with gf buffer (20 mm tris ph 8.0 and 150 mm nacl). peak fractions were pooled and concentrated to 1.3 mg/ml (amicon ultra 100 kda cut off, millipore sigma). purified s protein ectodomain (0.04 mg/ml) was mixed with v h ab8 (0.02 mg/ml) or soluble ace2 (0.02 mg/ml) and incubated on ice for 10 mins. for the competition experiment, the s protein (0.04 mg/ml) was first incubated on ice with v h ab8 (0.02 mg/ml) for 10 mins then followed by addition of ace2 (0.02 mg/ml) for another 10 mins. the mixtures (4.8 µl) were applied to 300mesh copper grids coated with continuous ultrathin carbon. grids were plasma cleaned using an h 2 /o 2 gas mixture for 15 s in a solarus plasma cleaner (gatan inc.) prior to adding the sample. samples were allowed to adsorb for 30 s before blotting away excess liquid, followed by a brief wash with milliq h 2 o. grids were stained by three successive applications of 2% (w/v) uranyl formate (20 s, 20 s, 60 s). grids containing s protein ectodomain with v h ab8, and s protein ectodomain mixed with both v h ab8 and soluble ace2 were imaged using a 200 kv glacios transmission electron microscope (thermofisher scientific) equipped with a falcon3 camera operated in linear mode. using epu automated acquisition software (thermofisher scientific), 15-frame movies were collected at 92,000x magnification (corresponding to a physical pixel size of 1.6 -) over a defocus range of -0.5 to -3.0 µm with an accumulated total dose of 40 e -/å 2 /movie. grids containing purified s protein ectodomain (0.04 mg/ml) with soluble ace2 (0.02 mg/ml) were imaged using a 200kv glacios transmission electron microscope equipped with a ceta 16m cmos camera (thermofisher scientific). micrographs were collected at 92,000x magnification (physical pixel 1.6 -) over a defocus range of -0.5 to -3.0 µm with a total dose of 50 e -/å 2 using epu automated acquisition software. c. image processing. motion correction and ctf estimation were performed in relion (3.1) (scheres, 2012) . particles were picked by cryolo (1.7.4) (wagner et al., 2019) with pre-trained model for negative stain data. after extraction, particles were imported to cryosparc live (v2.15.1) (punjani et al., 2017) and subjected to 2d classification and 3d heterogeneous classification. final density maps were obtained by 3d homogeneous refinement. figures were prepared using ucsf chimera (pettersen et al., 2004) . after washing, v h ab8 binding was detected by pe conjugated anti flag tag antibody. to test antibody mediated inhibition of cell fusion, the β-galactosidase (β-gal) reporter gene based quantitative cell fusion assay was used (xiao et al., 2003) . in this assay, 293t-s cell expression of t7 rna polymerase was achieved by infection with vaccinia virus vtf7.3, while 293t-ace2 cell expression of t7 promoter controlled β-gal was obtained by infection with vaccinia virus vcb21r. β-gal will be expressed only after fusion of the two types of cells, which can be monitored by chromogenic reactions using β-gal substrate. to assay cell-cell fusion, 293t cells stably expressing sars-cov-2 s (293t-s) cells were infected with t7 polymerase-expressing vaccinia virus (vtf7-3), and 293t cells stably expressing ace2 (293t-ace2) were infected with vaccinia virus (vcb21r lac-z) encoding t7 promotor controlled β-gal. two hours after infection, cells were incubated with fresh medium and transferred to 37 °c for overnight incubation. the next day, 293t-s cells were pre-mixed with serially diluted antibodies or ace2-fc at 37 °c for 1 h followed by incubation with 293t-ace2 cells at a 1:1 ratio for 3 h at 37°c. then cells were then lysed, and the β-gal activity was measured using βgalactosidase assay kit (substrate cprg, g-biosciences, st. louis, mo) following the manufacturer's protocol. fusion inhibition percentage (sample reading, f) was normalized by maximal fusion (reading, f max ) of 293t-s and 293t-ace2 cells in the absence of antibodies using this formula: fusion inhibition % = [(f max -f)/(f max -f blank )] × 100%, in which f blank refers to the od reading of 293t-s and 293t incubation wells. fusion inhibition percentage was plotted against antibody concentrations. experiments were performed in duplicate and the error bars denote ± 1 sd. pseudovirus neutralization assay. pseudovirus neutralization assay was performed based on previous protocols . briefly, hiv-1 backbone based pseudovirus was produced in 293t cells by co-transfection with plasmid encoding sars-cov-2 s protein and plasmid encoding luciferase expressing hiv-1 genome (pnl4-3.luc.re) using pei. pseudovirus-containing supernatants were collected 48 h later and concentrated using lenti-x™ concentrator kit (takara, ca). pseudovirus neutralization assay was then performed by incubation of sars-cov-2 pseudovirus with serially diluted antibodies or ace2-fc for 1 h at 37 °c, followed by addition of the mixture into pre-seeded 293t-ace2 cells. the mixture was then centrifuged at 1000 × g for 1 hour at room temperature. the medium was replaced 4 hrs later. after 24 h, luciferase expression was determined by bright-glo kits (promega, madison, wi) using biotek synergy multi-mode reader (winooski, vt). cells only and virus only wells were included and used for normalization. the 50% pseudovirus neutralizing antibody titer (ic 50 ) was calculated using graphpad prism 7. experiments were performed in duplicate and the error bars denote ± 1 sd. (mn) assay was used as previously described (agrawal et al., 2016a; agrawal et al., 2016b; du et al., 2013; du et al., 2014) . briefly, serially three-fold and duplicate dilutions of individual monoclonal antibodies (mabs) were incubated with 120 pfu of sars-cov or sars-cov-2 at room temperature for 2 h before transferring into designated wells of confluent vero e6 cells grown in 96-well microtiter plates. vero e6 cells cultured with medium with or without virus were included as positive and negative controls, respectively. mers-cov rbd-specific j o u r n a l p r e -p r o o f neutralizing m336 mab (ying et al., 2014a) were used as additional controls. after incubation at 37 o c for 4 days, individual wells were observed under the microcopy for the status of virus-induced formation of cytopathic effect. the efficacy of individual mabs was expressed as the lowest concentration capable of completely preventing virusinduced cytopathic effect in 100% of the wells. full-length viruses expressing luciferase were designed and recovered via reverse genetics as described previously (scobey et al., 2013; yount et al., 2003) . briefly, the sars-cov-2 rna from infected cell culture was reverse-transcribled and constructed into the seven contiguous genomic cdna subclones with interconnecting junctions, which were then bsai/bsmbi digested and ligated into a full-length sars-cov-2 genome cdna through the cohesive ends. a silent mutation of t15102a was introduced into a conserved region in nsp12 to differentiate our recombinant viruses from the circulating sars-cov-2 strains through sanger sequencing. the reporter viruse was synthesized by replacing a 276-bp region in orf7 with a gfp-fused nanoluciferase (nluc) gene. after assembly into full-length cdna, full-length rna was in vitro transcribed and was electroporated into vero e6 cells. virus stocks were propagated on vero e6 cells in minimal essential medium containing 10% fetal bovine serum (hyclone) and supplemented with penicillin/kanamycin (gibico). viruses were tittered in vero e6 usamrid cells to obtain a relative light units (rlu) signal of at least 20× the cell only control background. ab or ace2-fc were serially diluted 4-fold up to eight dilution spots with at a starting dilution 100 µg/ml, and were incubated with sars-cov-urbaninluc and sars-cov-2-seattlenluc viruses at 37°c with 5% co 2 for 1 hour. then virus-antibody dilution complexes were added to the pre-seeding e6 usamrid cells (20,000) in duplicate. virus-only controls and cell-only controls were included in each neutralization assay plate. following infection, plates were incubated at 37 °c with 5% co 2 for 48 hours. then cells were lysed and luciferase activity was measured via nano-glo luciferase assay system (promega) according to the manufacturer specifications. sars-cov and sars-cov-2 neutralization ic 50 were defined as the sample concentration at which a 50% reduction in rlu was observed relative to the average of the virus control wells. experiments were performed in duplicate and ic 50 was obtained by the non-linear fitting of neutralization curves in graphpad prism 7. mouse ace2 adapt sars-cov-2 variant was constructed by introduction of two amino acid changes (q498t/p499y) at the ace2 binding pocket in rbd. virus stocks were grown on vero e6 cells and viral titer was determined by plaque assay . groups of 5 each of 10 to 12-month old female balb/c mice (envigo, #047) were treated prophylactically (12 hours before infection) by intraperitoneal injection with 36, 8, or 2 mg/kg of v h -fc ab8, respectively. mice were challenged intranasally with 10 5 pfu of mouse-adapted sars-cov-2. two days post infection, mice were sacrificed and lung viral titer was determined by the plaque assay. to exclude the residual lung antibody impact on viral titration, mice were euthanized and perfused with 10 ml of pbs via cardiac puncture before lung harvest for viral titration. for virus titration, the caudal lobe of the right lung was homogenized in pbs. the resulting homogenate was serial-diluted and inoculated onto confluent monolayers of vero e6 cells, followed by agarose overlay. plaques were visualized via staining with neutral red on day 2 post j o u r n a l p r e -p r o o f infection. to measure the viral rna in the lung, tissue homogenate lysed in trizol ls (thermofischer) was then processed with thermofischer trizol rna isolation protocol followed by rt-qpcr using the quantifast probe rt-pcr kit (qiagen) to amplify a portion of upe gene. the 50% tissue culture infectious doses (tcid 50 ) equivalence were estimated by running serial dilutions of known tcid 50 standards. infection. sars-cov2/canada/on/vido-01/2020 was propagated on vero'76 cells using dmem with 2% fbs and 1µg/ml l-(tosylamido-2-phenyl) ethyl chloromethyl ketone (tcpk) trypsin. infectious work with sars-cov-2 was approved by the biosafety protocol approval committee (bpac) at the university of saskatchewan and performed in the high containment laboratories at vido-intervac. male hamsters (9-week-old) were obtained from charles river (montreal, qc). for evaluations of prophylactic efficacy, all hamsters (n=7) were injected intraperitoneally with 10 mg/kg of v h -fc ab8 24 hours prior to intranasal challenge of 50 µl/nare containing a total of 1×10 5 tcid 50 of sars-cov-2. for the therapeutic group, hamsters were infected as above and treated intraperitoneally with 10 mg/kg (n=3) or 3 mg/kg (n=4) of v h -fc ab8 6 hours post-infection. untreated hamsters were kept as a control. nasal washes and oral swabs were collected at day 1, 3 and 5 post infection (dpi). hamsters were bled at 1 and 5 dpi. all hamsters were euthanized on 5 dpi. at euthanasia, lung lobes were collected for virus titration and rna isolation. for viral titer determination, nasal washes were diluted in a 10-fold dilution series and absorbed on vero'76 cells in triplicates for 1 hour at 37°c. inoculum was removed and replaced with fresh dmem containing 2% fbs, penn/strep and 1µg/ml tpck. cytopathic effect was scored on day 3 and day 5 post infection. the limit of detection is 13.6 tcid 50 . for testing viral rna, viral rna isolated from nasal and oral swabs using the qiaamp viral rna mini kit (qiagen) and the quantifast probe rt-pcr kit (qiagen) to amplify a portion of upe gene. for rna levels in tissues, 30 mg of tissue homogenate in buffer rlt were processed with the rneasy kit (qiagen) followed by rt-qpcr as above. tcid 50 equivalence were estimated by running serial dilutions of known tcid 50 standards. for testing ab8 concentrations post injection at hamster sera and lung tissue, sars-cov-2 spike-1 elisa was used. s1 protein was coated at 1 µg/ml overnight at 4°c in pbs onto maxisorp plates (nunc). the following day plates were blocked with 5% skim milk and 0.05%tween20. serum collected on day 1 and day 5 post-challenge was diluted 1:100 and absorbed for 1 hour at 37 °c. plates were washed and goat anti human igg-hrp was added. plates were washed and subsequently developed with opd (o-phenylenediamine dihydrochloride) substrate. optical density was measured at 450 nm after 30 mins of incubation. for lung tissues, after blocking homogenates were diluted 1:10 and absorbed overnight at 4°c followed by detection with anti-human igg-hrp and substrate as stated above. the control hamster lung homogenate was used for background correction. for histopathology on day 5 p.i., 10% formalin fixed and paraffin embedded tissues were processed with either hematoxylin and eosin stain (h&e) or immunohistochemistry (ihc) for detection of sars-cov2 antigen; in ihc after blocking tissue slides were treated with anti-nucleocapsid rabbit polyclonal antibodies followed with anti-rabbit hrp antibody. (tucker et al., 2018) . the entire library of plasmids is arrayed in duplicate in a matrix format and transfected into hek-293t cells, followed by incubation for 36 h to allow protein expression. before specificity testing, optimal antibody concentrations for screening were determined by using cells expressing positive (membrane-tethered protein a) and negative (mock-transfected) binding controls, followed by flow cytometric detection with an alexa fluor-conjugated secondary antibody (jackson immunoresearch laboratories). based on the assay setup results, v h -fc ab8 (20 µg/ml) was added to the mpa. binding across the protein library was measured on an ique3 (ann arbor, mi) using the same fluorescently labeled secondary antibody. to ensure data validity, each array plate contained positive (fc-binding; sars-cov-2 s protein) and negative (empty vector) controls. identified targets were confirmed in a second flow cytometric experiment by using serial dilutions of the test antibody. the identity of each target was also confirmed by sequencing. for the mouse model, the statistical significance of difference between v h -fc ab8 treated and control mice lung virus titers was determined by the two-tailed, unpaired, student t test calculated using graphpad prism 7.0. a p value < 0.05 was considered significant. ** p < 0.01. for the mice lung viral titer after perfusion, viral rna and hamster lung viral rna, statistical significance was determined by the mann-whitney u test. a p value < 0.05 was considered significant. ns: p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001. for comparing v h -fc ab8 and igg1 ab1 concentration, significance analysis was determined by the two-way anova followed by tukey test in graphpad prism 7.0. a p value < 0.05 was considered significant. ns: p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. immunization with inactivated middle east respiratory syndrome coronavirus vaccine leads to lung immunopathology on challenge with live virus passive transfer of a germline-like neutralizing human monoclonal antibody protects transgenic mice against lethal middle east respiratory syndrome coronavirus infection sars-cov-2 vaccines: status report structures of human antibodies bound to sars-cov spike reveal common epitopes and recurrent features of antibodies potent neutralizing antibodies against sars-cov-2 identified by high-throughput single-cell sequencing of convalescent patients' b cells the convalescent sera option for containing covid-19 neutralizing antibody and soluble ace2 inhibition of a replication-competent vsv-sars-cov-2 and a clinical isolate of sars-cov-2 simulation of the clinical and pathological manifestations of coronavirus disease 2019 (covid-19) in golden syrian hamster model: implications for disease pathogenesis and transmissibility human domain antibodies to conserved sterically restricted regions on gp120 as exceptionally potent cross-reactive hiv-1 neutralizers construction of a large phage-displayed human antibody domain library with a scaffold based on a newly identified highly soluble, stable heavy chain variable domain humanized single domain antibodies neutralize sars-cov-2 by targeting spike receptor binding domain eliminating antibody polyreactivity through addition of n-linked glycosylation detection of 2019 novel coronavirus (2019-ncov) by real-time rt-pcr generation and characterization of alx-0171, a potent novel therapeutic nanobody for the treatment of respiratory syncytial virus infection a mouse-adapted sars-cov-2 model for the evaluation of covid-19 medical countermeasures a truncated receptor-binding domain of mers-cov spike protein potently inhibits mers-cov infection and induces strong neutralizing antibody responses: implication for developing therapeutics and vaccines a conformation-dependent neutralizing monoclonal antibody specifically targeting receptor-binding domain in middle east respiratory syndrome coronavirus spike protein site-specific biotinylation of purified proteins using bira in vivo imaging with antibodies and engineered fragments a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov-2 properties, production, and applications of camelid single-domain antibody fragments identification of a critical neutralization determinant of severe acute respiratory syndrome (sars)-associated coronavirus: importance for designing sars vaccines sars-cov-2 reverse genetics reveals a variable infection gradient in the respiratory tract neutralizing nanobodies bind sars-cov-2 spike rbd and block interaction with ace2 syrian hamsters as a small animal model for sars-cov-2 infection and countermeasure development physiological barriers to delivery of monoclonal antibodies and other macromolecules in tumors an emerging coronavirus causing pneumonia outbreak in wuhan, china: calling for developing therapeutic and prophylactic strategies occupancy and mechanism in antibody-mediated neutralization of animal viruses fc-fusion drugs have fcγr/c1q binding and signaling properties that may affect their immunogenicity structure of the sars-cov-2 spike receptor-binding domain bound to the ace2 receptor neutralization of sars-cov-2 spike pseudotyped virus by recombinant ace2-ig potent neutralization of sars-cov-2 in vitro and in an animal model by a human monoclonal antibody. biorxiv : the preprint server for biology antibody aggregation: insights from sequence and structure. antibodies (basel) 5 bat origin of a new human coronavirus: there and back again neutralizing antibodies isolated by a site-directed screening have potent protection on sars-cov-2 infection structural basis for neutralization of sars-cov-2 and sars-cov by a potent therapeutic antibody. science antibody fragments: hope and hype camel heavy-chain antibodies: diverse germline v(h)h and specific mechanisms enlarge the antigen-binding repertoire engineered autonomous human variable domains antiviral monoclonal antibodies: can they be more than simple neutralizing agents? ucsf chimera--a visualization system for exploratory research and analysis cross-neutralization of sars-cov-2 by a human monoclonal sars-cov antibody recombinant protein expression for structural biology in hek 293f suspension cells: a novel and accessible approach mutations in spike protein of sars-cov-2 modulate receptor binding cryosparc: algorithms for rapid unsupervised cryo-em structure determination the sars-cov-2 receptor-binding domain elicits a potent neutralizing response without antibodydependent enhancement isolation of potent sars-cov-2 neutralizing antibodies and protection from disease in a small animal model convalescent plasma in covid-19: possible mechanisms of action relion: implementation of a bayesian approach to cryo-em structure determination reverse genetics with a full-length infectious cdna of the middle east respiratory syndrome coronavirus a human neutralizing antibody targets the receptor-binding site of sars-cov-2 pathogenesis and transmission of sars-cov-2 in golden hamsters dynamic light scattering: a practical guide and applications in biomedical sciences potent neutralization of sars-cov-2 by human antibody heavy-chain variable domains potent binding of 2019 novel coronavirus spike protein by a sars coronavirus-specific human monoclonal antibody isolation of state-dependent monoclonal antibodies against the 12-transmembrane domain glucose transporter 4 using virus-like particles pharmacokinetic properties of igg and various fc fusion proteins in mice fcr gamma-chain is essential for both surface expression and function of human fc gamma ri (cd64) in vivo sphire-cryolo is a fast and accurate fully automated particle picker for cryo-em structural basis for potent neutralization of betacoronaviruses by single-domain camelid antibodies identification of human single-domain antibodies against sars-cov-2 a noncompeting pair of human neutralizing antibodies block covid-19 virus binding to its receptor ace2 the sars-cov s glycoprotein: expression and functional characterization structural basis for the recognition of sars-cov-2 by full-length human ace2 exceptionally potent neutralization of middle east respiratory syndrome coronavirus by human monoclonal antibodies monomeric igg1 fc molecules displaying unique fc receptor interactions that are exploitable to treat inflammation-mediated diseases reverse genetics with a full-length infectious cdna of severe acute respiratory syndrome coronavirus a highly conserved cryptic epitope in the receptor-binding domains of sars-cov-2 and sars-cov a safe and convenient pseudovirus-based inhibition assay to detect neutralizing antibodies and screen for viral entry inhibitors against the novel human coronavirus mers-cov potent cross-reactive neutralization of sars coronavirus isolates by human monoclonal antibodies potently neutralizing and protective human antibodies against sars-cov-2 key: cord-342189-ya05m58o authors: banerjee, abhik k.; blanco, mario r.; bruce, emily a.; honson, drew d.; chen, linlin m.; chow, amy; bhat, prashant; ollikainen, noah; quinodoz, sofia a.; loney, colin; thai, jasmine; miller, zachary d.; lin, aaron e.; schmidt, madaline m.; stewart, douglas g.; goldfarb, daniel; de lorenzo, giuditta; rihn, suzannah j.; voorhees, rebecca; botten, jason w.; majumdar, devdoot; guttman, mitchell title: sars-cov-2 disrupts splicing, translation, and protein trafficking to suppress host defenses date: 2020-10-08 journal: cell doi: 10.1016/j.cell.2020.10.004 sha: doc_id: 342189 cord_uid: ya05m58o sars-cov-2 is a recently identified coronavirus that causes the respiratory disease known as covid-19. despite the urgent need, we still do not fully understand the molecular basis of sars-cov-2 pathogenesis. here, we comprehensively define the interactions between sars-cov-2 proteins and human rnas. nsp16 binds to the mrna recognition domains of the u1 and u2 splicing rnas and acts to suppress global mrna splicing upon sars-cov-2 infection. nsp1 binds to 18s ribosomal rna in the mrna entry channel of the ribosome and leads to global inhibition of mrna translation upon infection. finally, nsp8 and nsp9 bind to the 7sl rna in the signal recognition particle and interfere with protein trafficking to the cell membrane upon infection. disruption of each of these essential cellular functions acts to suppress the interferon response to viral infection. our results uncover a multipronged strategy utilized by sars-cov-2 to antagonize essential cellular processes to suppress host defenses. coronaviruses are a family of viruses with notably large single-stranded rna genomes and broad species tropism among mammals (graham and baric, 2010) . recently, a coronavirus, sars-cov-2, was discovered to cause the severe respiratory disease known as covid-19. it is highly transmissible within human populations and its spread has resulted in a global pandemic with more than a million deaths to date (andersen et al., 2020; zou et al., 2020) . we do not fully understand the molecular basis of infection and pathogenesis of this virus in human cells. accordingly, there is an urgent need to understand these mechanisms to guide the development of therapeutics. sars-cov-2 encodes 27 proteins with diverse functional roles in viral replication and packaging( bar-on et al., 2020; wang et al., 2020) . these include 4 structural proteins: the nucleocapsid (n, which binds the viral rna), and the envelope (e), membrane (m), and spike (s) proteins, which are integral membrane proteins. in addition, there are 16 non-structural proteins (nsp1-16) which encode the rna-directed rna polymerase, helicase, and other components required for viral replication (da silva et al., 2020) . finally, there are 7 accessory proteins (orf3a-8) whose function in viral replication or packaging remain largely uncharacterized (chen and zhong, 2020; finkel et al., 2020) . as obligate intracellular parasites, viruses require host cell components to translate and transport their proteins and to assemble and secrete viral particles (maier et al., 2016) . upon viral infection, the mammalian innate immune system acts to rapidly detect and block viral infection at all stages of the viral life cycle (chow et al., 2018; jensen and thomsen, 2012; wilkins and gale, 2010) . the primary form of intracellular viral surveillance engages the interferon pathway, which amplifies signals resulting from detection of intracellular viral components to induce a systemic type i interferon response upon infection (stetson and medzhitov, 2006) . specifically, cells contain various rna sensors (such as rig-i and mda5) that detect the presence of viral rnas, promote nuclear translocation of the transcription factor irf3 leading to transcription, translation, and secretion of interferon (e.g. ifn-α and ifn-β) . binding of interferon to cognate cell-surface receptors leads to transcription and translation of hundreds of antiviral genes. j o u r n a l p r e -p r o o f 4 in order to successfully replicate, viruses employ a range of strategies to counter host antiviral responses (beachboard and horner, 2016) . in addition to their essential roles in the viral life cycle, many viral proteins also antagonize core cellular functions in human cells to evade host immune responses. for example, human cytomegalovirus (hcmv) encodes proteins that inhibit class 1 major histocompatibility (mhc) display on the cell surface by retaining mhc proteins in the endoplasmic reticulum (miller et al., 1998) , polioviruses encode proteins that degrade translation initiation factors (eif4g) to prevent translation of 5'-capped host mrnas (kempf and barton, 2008; lloyd, 2006) , and influenza a encodes a protein that modulates mrna splicing to degrade the mrna that encodes rig-i (kochs et al., 2007; zhang et al., 2018) . suppression of the interferon response has recently emerged as a major clinical determinant of covid-19 severity , with almost complete loss of secreted ifn characterizing the most severe cases (hadjadj et al., 2020) . the extent to which sars-cov-2 suppresses the interferon response is a key characteristic that distinguishes covid-19 from sars and mers (lokugamage et al., 2020) . several strategies have been proposed for how the related sars-and mers-causing viruses may hijack host cell machinery and evade immune detection, including repression of host mrna transcription in the nucleus (canton et al., 2018) , degradation of host mrna in the nucleus and cytoplasm (kamitani et al., 2009; , and inhibition of host translation (nakagawa et al., 2018) . nonetheless, the extent to which sars-cov-2 uses these or other strategies, and how they may be executed at a molecular level remains unclear. understanding the interactions between viral proteins and components of human cells is essential for elucidating their pathogenic mechanisms and for development of effective therapeutics. because sars-cov-2 is an rna virus and many of its encoded proteins are known to bind rna (sola et al., 2011) , we reasoned that these viral proteins may interact with specific human mrnas (critical intermediates in protein production) or non-coding rnas (critical structural components of diverse cellular machines) to promote viral propagation. here, we comprehensively define the interactions between each sars-cov-2 protein and human rnas. we show that 10 viral proteins form highly specific interactions with mrnas or ncrnas, including those involved in progressive steps of host cell protein production. we show j o u r n a l p r e -p r o o f 5 that nsp16 binds to the mrna recognition domains of the u1 and u2 rna components of the spliceosome and acts to suppress global mrna splicing in sars-cov-2-infected human cells. we find that nsp1 binds to a precise region on the 18s ribosomal rna that resides in the mrna entry channel of the initiating 40s ribosome. this interaction leads to global inhibition of mrna translation upon sars-cov-2 infection of human cells. finally, we find that nsp8 and nsp9 bind to discrete regions on the 7sl rna component of the signal recognition particle (srp) and interfere with protein trafficking to the cell membrane upon infection. we show that disruption of each of these essential cellular functions acts to suppress the type i interferon response to viral infection. together, our results uncover a multipronged strategy utilized by sars-cov-2 to antagonize essential cellular processes and robustly suppress host immune defenses. we cloned all 27 of the known sars-cov-2 viral proteins into mammalian expression vectors containing an n-terminal halotag (los et al., 2008) (figure s1a , methods), expressed each in hek293t cells, and exposed them to uv light to covalently crosslink proteins to their bound rnas. we then lysed the cells and purified each viral protein using stringent, denaturing conditions to disrupt any non-covalent associations and capture those with a uv-mediated interaction ( figure 1a , methods). as positive and negative controls, we purified a known human rna binding protein (ptbp1) and a metabolic protein (gapdh) (figure s1a-e). we successfully purified 26 of the 27 viral proteins (figure s1a ; full-length spike was not soluble when expressed). we found that 10 viral proteins (nsp1, nsp4, nsp8, nsp9, nsp12, nsp15, nsp16, orf3b, n, and e protein) bind to specific host rnas (p-value < 0.001, figure 1b , table s1), including 6 structural ncrnas and 142 mrnas (table s1). these include mrnas involved in protein translation (e.g. cops5, eif1, and rps12,), protein transport (atp6v1g1, slc25a6, and tomm20), protein folding (hspa5, hspa6, and hspa1b), transcriptional regulation (yy1, id4, and ier5), and immune response (jun, aen, and j o u r n a l p r e -p r o o f 6 rack1) (fdr < 0.05, figure 1b , s1f). importantly, the observed interactions are highly specific for each viral protein, and each protein binds to a precise region within each rna ( figures 1c, s1f ). using these data, we identified several viral proteins that interact with structural ncrna components of the spliceosome (u1 and u2 snrna), the ribosome (18s and 28s rrna), and the signal recognition particle (7sl) (figure 1b) . because these molecular machines are essential for three essential steps of protein production -mrna splicing, translation, and protein trafficking -we focused on their interactions with viral proteins to understand their functions and mechanisms in sars-cov-2 pathogenesis. after transcription in the nucleus, nascent pre-mrnas are spliced to generate mature mrnas which are translated into protein. splicing is mediated by a complex of ncrnas and proteins known as the spliceosome. specifically, the u1 small nuclear rna (snrna) hybridizes to the 5' splice site at the exon-intron junction and the u2 snrna hybridizes to the branchpoint site within the intron to initiate splicing of virtually all human mrnas (séraphin et al., 1988) . we identified a highly specific interaction between the nsp16 viral protein and the u1 and u2 snrnas ( figure 1b) . because u1 and u2 are small rnas (164 and 188 nucleotides, respectively), we noticed strong enrichment of nsp16-associated reads across the entire length of each. to more precisely define the binding sites, we exploited the well-described tendency of reverse transcriptase to preferentially terminate when it encounters a uv-crosslinked protein on rna (konig et al., 2010) (figures 1a, s1d) . we determined that nsp16 binds to the 5' splice site recognition sequence of u1 (figures 2a-b, s2a based on the locations of the nsp16 binding sites relative to the mrna recognition domains of the u1/u2 spliceosomal components, we hypothesized that nsp16 might disrupt splicing of newly transcribed genes ( figure 2f ). to test this, we co-expressed nsp16 in human cells along with a splicing reporter derived from irf7 (an exon-intron-exon minigene) fused to gfp (majumdar et al., 2018) . in this system, if the reporter is spliced, then gfp is made; if not, translation is terminated (via a stop codon present within the first intron) and gfp is not produced ( figure 3a) . we observed a >3-fold reduction in gfp levels in the presence of nsp16 compared to a control human protein (figures 3b, s3a ). to explore whether nsp16 has a global impact on splicing of endogenous mrnas, we measured the splicing ratio of each gene using nascent rna sequencing. specifically, we metabolically labeled nascent rna by feeding cells for 20 minutes with 5-ethynyl uridine (5eu), purified and sequenced 5eu-labeled rna, and quantified the proportion of unspliced fragments spanning the 3' splice site of each gene (figure 3c, s3b) . we observed a global increase in the fraction of unspliced genes in the presence of nsp16 compared to controls ( figure 3d, s3c,d) . given that nsp16 is sufficient to suppress global mrna splicing, we expect that its expression in sars-cov-2-infected cells would result in a global mrna splicing deficit. to test this, we infected human lung epithelial cells (calu3) with sars-cov-2 and measured splicing levels of newly transcribed mrnas compared to a mock infected control. as expected, we observed a global increase in the fraction of unspliced transcripts upon sars-cov-2 infection, with ~90% of measured genes showing increased intron retention (figure 3e, s3e ). together these results indicate that nsp16 binds to the splice site and branch point sites of u1/u2 to suppress global mrna splicing in sars-cov-2 infected cells ( figure 3f) . although nsp16 is known to act as an enzyme that deposits 2'-o-methyl modifications on viral rnas (decroly et al., 2011) , our results demonstrate that it also acts as a host virulence factor. global disruption of mrna splicing may act to decrease host protein and mrna levels by triggering nonsense-mediated decay of improperly spliced mrnas (kurosaki et al., 2019) . consistent with this, we observed a strong global decrease in steady-state mrna levels (relative to ncrna levels) upon sars-cov-2 infection ( figure s3f ). j o u r n a l p r e -p r o o f 8 inhibition of mrna splicing suppresses host interferon response to viral infection because many of the key genes stimulated by interferon (ifn) are spliced, we reasoned that mrna splicing would be critical for a robust ifn response. to test this, we utilized a reporter line engineered to express alkaline phosphatase upon ifn signaling (mimicking an antiviral response gene). this ifn stimulated gene (isg) reporter line can be stimulated using ifn-β and assayed for reporter induction. we observed strong repression of this ifn responsive gene upon expression of nsp16 ( figure 3g ) and upon addition of a small molecule that interferes with spliceosomal assembly (figure s3g ). these results demonstrate that one outcome of nsp16mediated inhibition of mrna splicing is to reduce the host cells' innate immune response to viral recognition. consistent with such a role, we observed an increase in intron retention within multiple ifn-responsive genes (such as isg15 and rig-i) upon sars-cov-2 infection ( figure 3h , s3h-i). once exported to the cytoplasm, spliced mrna is translated into protein on the ribosome. initiation of translation begins with recognition of the 5' cap by the small 40s subunit (which scans the mrna to find the first start codon). we observed that nsp1 binds exclusively to the 18s ribosomal rna (figure 1b and s4a ) -the structural rna component of the 40s ribosomal subunit. several roles for nsp1 have been reported in sars-cov and mers-cov including roles in viral replication, translational inhibition, transcriptional inhibition, mrna degradation, and cell cycle arrest (brockway and denison, 2005; kamitani et al., 2009; lokugamage et al., 2015; narayanan et al., 2015) . one of the reported roles for nsp1 in sars-cov is that it can associate with the 40s ribosome to inhibit host mrna translation (kamitani et al., 2009; tanaka et al., 2012 ), yet it remains unknown whether this association is due to interaction with the ribosomal rna, protein components of the ribosome, or other auxiliary ribosomal factors. accordingly, the mechanisms by which nsp1 acts to suppress protein production remain elusive. j o u r n a l p r e -p r o o f 9 we mapped the location of nsp1 binding to a 37 nucleotide region corresponding to helix 18 ( figure 4a) , adjacent to the mrna entry channel (simonetti et al., 2020) (figure 4b ). the interaction would position nsp1 to disrupt 40s mrna scanning and prevent translation initiation (figure 4b) , and disrupt trna recruitment to the 80s ribosome and block protein production ( figure s4b) . interestingly, the nsp1 binding site includes the highly conserved g626 nucleotide which monitors the minor groove of the codon-anticodon helix for trna binding fidelity (ogle et al., 2001) . we noticed that the c-terminal region of nsp1 has similar structural regions to serbp1 (brown et al., 2018) and stm1 (ben-shem et al., 2011a) , two known ribosome inhibitors that bind within the mrna entry channel to preclude mrna access ( figure s4c ). consistent with this, a recent cryo-em structure confirms that nsp1 binds to these same nucleotides of 18s within the mrna entry channel (thoms et al., 2020) . given the location of nsp1 binding on the 40s ribosome, we hypothesized that it could suppress global initiation of mrna translation. to test this, we performed in vitro translation assays of a gfp reporter in hela cell lysates and found that addition of nsp1 led to potent inhibition of translation ( figure s4d ). we observed a similar nsp1-mediated translational repression when we co-expressed nsp1 and a gfp reporter gene in hek293t cells (figure 4c-d) . in contrast, we did not observe this inhibition when we expressed other sars-cov-2 proteins (nsp8, nsp9, m) or human proteins (gapdh) ( figure 4d ). to determine if nsp1 leads to translational inhibition of endogenous proteins in human cells, we used a technique called surface sensing of translation (sunset) to measure global protein production levels (schmidt et al., 2009) . in this assay, translational activity is measured by the level of puromycin incorporation into elongating polypeptides ( figure s4e ). we observed a strong reduction in the level of global puromycin integration in cells expressing nsp1 compared to cells expressing gfp (figure s4f-g) . because nsp1 expression is sufficient to suppress global mrna translation in human cells, we hypothesized that sars-cov-2 infection would also suppress global translation. to test this, we infected a human lung epithelial (calu3) or monkey kidney (vero) cell line with sars-cov-2 and measured nascent protein synthesis levels using sunset. we observed a strong reduction of to explore whether nsp1 binding to 18s rrna is critical for translational repression, we generated a mutant nsp1 in which two positively charged amino acids (k164 and h165) in the c-terminal domain were replaced with alanine residues (figure s4c ) (narayanan et al., 2008) . we observed a complete loss of in vivo contacts with 18s ( figure 4g) ; because this mutant disrupts ribosome contact, we refer to it as nsp1∆rc. we co-expressed gfp and nsp1∆rc in hek293t cells and found that the mutant fails to inhibit translation ( figure 4h and s4j) . in contrast, mutations to the positively charged amino acids at positions 124/125 do not impact 18s binding ( figure 4g ) or the ability to inhibit translation ( figure 4h ). together, these results demonstrate that nsp1 binds within the mrna entry channel of the ribosome and that this interaction is required for translational inhibition of host mrnas upon sars-cov-2 infection. we explored whether nsp1 binding to 18s rrna suppresses the ability of cells to respond to ifn-β stimulation upon viral infection. we transfected isg reporter cells with nsp1, stimulated with ifn-β, and observed robust repression of the ifn responsive gene (>6-fold, figure 4i ). to confirm that this nsp1-mediated repression occurs in human cells upon activation of double stranded rna (dsrna)-sensing pathways typically triggered by viral infection, we treated a human lung epithelial cell line (a549) with poly(i:c), a molecule that is structurally similar to dsrna and known to induce an antiviral innate immune response (alexopoulou et al., 2001; kato et al., 2006) (figure s4k ). we observed a marked downregulation of ifn-β protein and endogenous ifn-β responsive mrnas in the presence of nsp1, but not in the presence of nsp1∆rc ( figure s4l, m) . these results demonstrate that nsp1, through its interaction with 18s rrna, suppresses the innate immune response to viral recognition ( figure 4j ). j o u r n a l p r e -p r o o f 11 because nsp1 blocking the mrna entry channel would impact both host and viral mrna translation, we explored how translation of viral mrnas is protected from nsp1-mediated translational inhibition. many viruses contain 5' untranslated regions that regulate viral gene expression and translation (gaglia et al., 2012) ; all sars-cov-2 encoded subgenomic rnas contain a common 5' leader sequence that is added during negative strand synthesis (kim et al., 2020b) . we explored whether the leader sequence protects viral mrnas from translational inhibition by fusing the viral leader sequence to the 5' end of gfp or mcherry reporter genes ( figure s5a ). we found that nsp1 fails to suppress translation of these leader-containing mrnas ( figure 5a -b, s5b). we dissected the leader sequence and found that the first stem loop (sl1) is sufficient to prevent translational suppression upon nsp1 expression ( figure 5c) or sars-cov-2 infection ( figure 5d ). we considered three models for how the leader could protect viral mrnas: (i) it could compete with the ribosome for nsp1 binding, (ii) it could directly recruit free ribosomes or (iii) nsp1 could bind to the leader independently of its ribosome interaction to allosterically modulate the nsp1-ribosome interaction. we reasoned that if the leader competes for nsp1 binding or directly recruits free ribosomes, then the presence of sl1 should be sufficient for protection, regardless of its precise position in the 5' utr. in contrast, if the leader allosterically modulates ribosome binding then the spacing between the 5' cap (which is bound to nsp1-40s) and sl1 would be critical for protection. to distinguish between these models, we swapped the location of sl1 and sl2 in the 5' leader or inserted 5 nucleotides between the 5' cap and sl1 ( figure s5c ) and found that both mutants ablate protection ( figure 5e, s5d) . these results indicate that an mrna requires the 5' leader to be precisely positioned relative to the nsp1-bound 40s ribosome to enable translational initiation ( figure 5f ). while many aspects of this allosteric model remain to be explored, it would explain how leader-mediated protection can occur on an mrna only when present in cis. moreover, this model suggests that nsp1 might also act to further increase viral mrna translation by actively recruiting the ribosome to its own mrnas. consistent with this, we observe a consistent ~20% increase in 12 translation of leader-containing reporter levels upon viral infection ( figure 5d ) or expression of nsp1 ( figure s5e ). upon engaging the start codon in an mrna, the 60s subunit of the ribosome is recruited to form the 80s ribosome which translates mrna. the signal recognition particle (srp) is a universally conserved complex that binds to the 80s ribosome and acts to co-translationally scan the nascent peptide to identify hydrophobic signal peptides present in integral membrane proteins and proteins secreted from the plasma membrane (akopian et al., 2013) . when these are identified, srp triggers ribosome translocation to the endoplasmic reticulum (er) to ensure proper folding and trafficking of these proteins to the cell membrane (akopian et al., 2013) . we identified two viral proteins -nsp8 and nsp9 -that bind at distinct and highly specific regions within the s-domain of the 7sl rna scaffold of srp ( figure 6a , s6a). nsp8 interacts with 7sl in the region bound by srp54 (the protein responsible for signal peptide recognition, srp-receptor binding, and ribosome translocation) (akopian et al., 2013; holtkamp et al., 2012) ( figure 6b ). nsp9 binds to 7sl in the region that is bound by the srp19 protein ( figure 6b ), which is required for proper folding and assembly of srp (including proper loading of srp54) (akopian et al., 2013) . because srp scans nascent peptides co-translationally, we were intrigued to find that nsp8 also forms a highly specific interaction with 28s rrna (the structural component of the 60s subunit) ( figure 6c, s6b) . the binding site on 28s rrna corresponds to the largest human-specific expansion segment within the ribosome, referred to as es27 (parker et al., 2018) . es27 is highly dynamic, and thus has not been resolved in most ribosome structures (zhang et al., 2014) . however, when engaged by specific factors, es27 can become ordered, and was recently shown to be capable of interacting with the ribosome exit tunnel, adjacent to the 60s binding site of srp ( figure 6d , s6c) (wild et al., 2020) . together, these observations suggest that nsp8 and nsp9 bind to the co-translational srp complex. consistent with this, we find that nsp8 and nsp9 localize broadly throughout the j o u r n a l p r e -p r o o f 13 cytoplasm when expressed in human cells ( figure s6d ) or upon sars-cov-2 infection ( figure s6e -f). because nsp8 and nsp9 binding on 7sl are positioned to disrupt srp function, we hypothesized that they may alter translocation of secreted and integral membrane proteins ( figure s7a ). to test this, we expressed an srp-dependent membrane protein (nerve growth factor receptor, ngfr (izon et al., 2001a) ) fused via an internal ribosome entry site (ires) to a non-membrane gfp ( figure s7f ). in this system, if a perturbation specifically affects membrane protein levels we expect to see a decrease in the ratio of membrane to non-membrane protein levels. to ensure that the ngfr reporter accurately reports on srp function, we treated hek293t cells with sirnas against srp54 or srp19 and found that both lead to a dramatic reduction of the ngfrmembrane protein relative to the non-membrane gfp protein ( figure s7b) . similarly, we found that expression of nsp8 and nsp9 (alone or together) lead to a striking reduction in expression of ngfr relative to gfp ( figure 7a ). expression of control proteins did not specifically impact ngfr levels ( figure 7a, s7b) . to determine if there is a global effect on membrane protein levels, we utilized the sunset method to measure puromycin levels in membrane proteins using flow cytometry (see methods). we confirmed that disruption of srp leads to a global reduction in puromycin levels in the cell membrane ( figure s7c ). we observed a comparable global reduction of puromycin-labeled membrane proteins upon expression of nsp8 or nsp9 individually or together, but not with control proteins (figure 7b, s7c) . because nsp8 and nsp9 are each sufficient to suppress protein integration into the cell membrane, we anticipate that sars-cov-2 infection would lead to similar suppression. j o u r n a l p r e -p r o o f 14 however, determining whether sars-cov-2 infection specifically impacts membrane protein expression is confounded by the fact that nsp1 inhibits translation of membrane and nonmembrane proteins upon infection. to address this, we co-expressed a membrane protein reporter (ngfr) containing the 5' viral leader along with a non-membrane gfp reporter containing the viral leader. upon viral infection, we observed a strong reduction of membrane protein levels ( figure 7c ), but no reduction in non-membrane gfp levels ( figure 5d ). to ensure that these effects are specific to sars-cov-2 infected cells, we separated individual cells within the infected population into those expressing the viral spike protein (s+) and those not expressing the protein (s-). we found that the shift in membrane protein levels only occurs in s+ cells ( figure 7d ), while the spopulation resembled the mock infected samples ( figure 7c ). we observed a strong relationship between the level of spike protein -likely reflecting the amount of viral replication within each cell -and the level of membrane protein suppression ( figure 7c ). we observed this membrane protein-specific decrease upon infection of human lung epithelial (calu3, figure s7d ) and monkey kidney (vero, figure 7c -d) cell lines. together, these results demonstrate that nsp8 and nsp9 bind to 7sl to disrupt srp function and suppress membrane protein trafficking in sars-cov-2 infected cells. although nsp8 and nsp9 are thought to be components of the viral replication machinery (sutton et al., 2004) , our results indicate that they play an additional role as host virulence factors. because viral membrane proteins also require trafficking to the er, viral disruption of srp might negatively impact viral propagation, unless viral proteins are trafficked in an srp-independent manner ( figure s7e ) or if nsp8/9 selectively impacts host (but not viral) proteins. next we explored how disruption of srp might be advantageous for viral propagation. because secretion of ifn and other cytokines is dependent on the srp complex for secretion ( figure s7f ), a central component of the ifn response is dependent on srp. accordingly, we hypothesized that nsp8/9-mediated viral suppression of srp would act to suppress the ifn j o u r n a l p r e -p r o o f 15 response upon infection. to test this, we co-expressed nsp8 and nsp9 and observed a significant reduction in the ifn response relative to a control protein ( figure s7g ). together, these results suggest that sars-cov-2 mediated suppression of srp-dependent protein secretion enables suppression of host immune defenses ( figure 7e) . interestingly, many proteins involved in anti-viral immunity -including most cytokines and class i major histocompatibility complex -are membrane-anchored or secreted, and are known to use the srp pathway for transport (vermeire et al., 2014) (figure s7f ), suggesting that there may be other effects of srp pathway inhibition on sars-cov-2 pathogenesis. we identified several pathogenic functions of sars-cov-2 in human cells -including global inhibition of host mrna splicing, protein translation, and membrane protein trafficking -and described the molecular mechanisms by which the virus acts to disrupt these essential cell processes. interestingly, all of the viral proteins involved (nsp1, nsp8, nsp9, and nsp16) are produced in the first stage of the viral life cycle, prior to generation of double stranded rna (dsrna) products during viral genome replication. because dsrna is detected by host immune sensors and triggers the type i interferon response, disruption of these cellular processes would allow the virus to replicate its genome while minimizing the host innate immune response. disruption of these three non-overlapping steps of protein production may represent a multipronged mechanism that synergistically acts to suppress the host antiviral response ( figure 7f) . specifically, the ifn response is usually boosted >1,000-fold upon viral detection (through amplification and feedback, figure s4k ), yet each individual mechanism impacts ifn levels on the order of ~5-10-fold. accordingly, if each independent mechanism impacts ifn levels moderately, the three together may be able to achieve dramatic suppression of ifn (10 3 =1,000fold). this multi-pronged mechanism may explain the molecular basis for the potent suppression of ifn observed in severe covid-19 patients. interferon is emerging not only as a determinant of disease severity, but also a potential treatment option (zhou et al., 2020) . as such, our work identifies several therapeutic opportunities for boosting ifn levels upon sars-cov-2 infection. for example, disrupting the interaction between nsp1 and 18s rrna could allow cells to detect and respond to viral infection. because many small-molecule drugs target ribosomal rnas (liaud et al., 2019) , it may be possible to develop drugs to block the nsp1-18s and other interactions. additionally, disrupting the 5' viral leader may be a potent antiviral strategy since it is critical for translation of all viral proteins. because sl1 is a structured rna, it may be possible to design small molecules that specifically bind this structure to suppress viral protein production (hermann, 2016) . viral suppression of these cellular functions is not exclusive to the ifn response and will also impact other spliced, translated, secreted, and membrane proteins. many proteins involved in anti-viral immunity are spliced and/or membrane-anchored or secreted. for example, class i major histocompatibility complex (mhc), which is critical for antigen presentation to cd8 t cells at the cell surface of infected cells (hansen and bouvier, 2009 ). by antagonizing membrane trafficking, sars-cov-2 may prevent viral antigens from being presented on mhc and allow infected cells to escape t-cell recognition and clearance. in this way, interference with these essential cellular processes might further aid sars-cov-2 in evading the host immune response. more generally, we expect that insights gained from the sars-cov-2 protein-rna binding maps will be critical for exploring additional viral mechanisms. specifically, we identified many other interactions, including highly specific interactions with mrnas. for example, nsp12 binds to the jun mrna ( figure s1e ) which encodes the critical immune transcription factor c-jun which is activated in response to multiple cytokines and immune signaling pathways (weston and davis, 2007) . we also identified an interaction between nsp9 and the start codon of the mrna that encodes cops5 (figure 1c) , the enzymatic subunit of the cop9 signalosome complex which regulates protein homeostasis (cope and deshaies, 2003) , suggesting that it might disrupt its translation. interestingly, cops5 (also known as jab1) is known to bind and stabilize c-jun protein levels (claret et al., 1996) and several viruses are known to disrupt this protein (lungu et al., 2008; oh et al., 2006; tanaka et al., 2006) . while it remains unknown what, if any, role these interactions play in virally infected cells, the specificity suggests that they may provide a selective advantage for viral propagation. together, our results demonstrate that global mapping of rna binding by viral proteins could enable rapid characterization of mechanisms for emerging pathogenic rna viruses. we note several limitations of our current study that will need to be explored in future work. (i) our mapping experiments were performed in uninfected human cells expressing tagged viral proteins. accordingly, it remains possible that our maps may not fully capture all of the interactions that occur when human cells are infected, such as interactions that occur with viralinduced rnas, in specific viral compartments, or that require multiple viral proteins. (ii) while we characterized the functional and mechanistic roles of several viral proteins and structural ncrnas, we did not explore what roles viral protein interactions with mrnas might play. (iii) how the virus disrupts fundamental cellular processes while still maintaining its own production is still largely undefined. while we showed that the 5' leader is sufficient to relieve translational inhibition by nsp1, we still do not fully understand how this protection occurs and specifically how nsp1 might interact with the viral leader or allosterically modulate ribosome binding. similarly, viral membrane proteins are dependent on trafficking to the er and how nsp8/9 might selectively impact er translocation of host -but not viral -proteins remains to be explored. (iv) while we showed that viral disruption of these essential cellular functions can suppress ifn, what other roles host cell shutdown might play in viral pathogenesis and in suppressing other aspects of anti-viral immunity, including possible roles in adaptive immune responses, have not been explored. the authors declare no competing interests. further information and requests for reagents and resources should be directed to and will be fulfilled by the lead contact, mitchell guttman (mguttman@caltech.edu). all constructs and plasmids generated in this study will be made available on request sent to the lead contact with a completed materials transfer agreement. all datasets generated during this study are available at ncbi short read archive: bioproject prjna665692 (viral protein purifications) and prjna665581 (nascent and total rna-seq) essential medium (atcc) containing 10% fbs and 1% penicillin-streptomycin purchased from thermo fisher scientific. all cell lines were maintained at 37°c under 5% co 2 . cells were grown in a humidified incubator at 37ºc with 5% co 2 . all experiments using infectious sars-cov-2 conducted at the uvm bsl-3 facility were cloning of expression constructs. sars-cov-2 protein constructs (with the exception of nsp11) were a gift from fritz roth (see table s3 for addgene information) (kim et al., 2020a) and were lr-cloned (invitrogen gateway cloning, thermo fisher scientific) into mammalian expression destination vector pcag-halo-tev-dest-v5-ires-puror. note that following lr cloning, proteins were not v5-tagged because all entry clones contained stop codons. for nsp11, an entry clone was generated by bp cloning (invitrogen gateway cloning, thermo manganese/calcium mix (0.5mm cacl 2 , 2.5 mm mncl 2 ). samples were incubated on ice for 10 minutes to allow lysis to proceed. the lysates were then incubated at 37°c for 10 minutes at 700 rpm shaking on a thermomixer (eppendorf). lysates were cleared by centrifugation at 15,000× g for 2 minutes. the supernatant was collected and kept on ice until bound to the halolink resin (promega). of the 1ml lysis volume, 50ul was set aside for input, 20ul used for protein expression confirmation, and the rest for capture on halolink resin as described below. for qpcr analysis, cdna was generated from purified rna using maxima h-reverse transcriptase (thermo fisher scientific) following manufacturer's recommendations. amplification reactions were assembled with primer sets indicated in table s2 and lightcycler® 480 sybr green i master (roche) following manufacturer's protocols and read out in a roche lightcycler 480. library construction. rna-seq libraries were constructed from purified rna as previously described (van nostrand et al., 2016) . briefly, after proteinase k elution, the rna was dephosphorylated (fast ap) and cyclic phosphates removed (t4 pnk) and then cleaned using silane beads as previously described (van nostrand et al., 2016 ). an rna adapter containing a rt primer binding site was ligated to the 3' end of the cleaned and end-repaired rna. the ligated rna was reverse transcribed (rt) into cdna, the rna was degraded using naoh, and a second adapter was ligated to the single stranded cdna. library preparation was the same for input samples except that an initial chemical fragmentation step (90°c for 2 min 30 s in 1x fastap buffer) was included prior to fastap treatment. this chemical fragmentation step was designed to be similar to the fragmentation conditions used for purified halo bound samples. the for staining of infected cells, cells were fixed and permeabilized in 8% formaldehyde 1% triton, and subsequently labelled with primary antibodies raised in sheep to sars-cov-2 at 1/500 dilution, followed by incubation with a rabbit anti-sheep alexa 555 secondary antibody (abcam, ab150182) at 1/1000 dilution and mounted with dapi in the medium (thermo fisher scientific, cat# p36395). cells were imaged with a zeiss lsm 880 confocal microscope, with 1 airy unit pinhole for all primary antibody channel acquisitions and pixel size 0.07 µm x 0.07 µm. the objective lens used was a zeiss plan-apochromatic 63x/1.4na m27. structure modeling nsp1 homology model. the predicted model of sars-cov-2 nsp1 was generated using the transform-restrained rosetta (trrosetta) algorithm, a deep learning-based modeling method based on the rosetta energy minimization pipeline with additional distance and interaction restraints generated from co-evolution (yang et al., 2020) . all figures were generated using pymol (www.pymol.org). nsp1-ribosome model. the model of nsp1 bound to the ribosome was generated using modeller version 9.24 (webb and sali, 2016) . the c-terminal sequence of nsp1 (khssgvtrelmrelngg) was modeled using the structure of serbp1 bound to the ribosome (pdb id: 6mte, chain w) as a template. the default modeller parameters were used to create an alignment of nsp1 and serbp1 and to generate the model, and all atoms within 6å of serbp1 were included in the model to define the neighboring environment. twenty models were generated and the model with the lowest dope score was selected to visualize with pymol (delano, 2002) . x-ray crystal structures and cryo-electron microscopy structures were obtained from the protein data bank (www.rcsb.org) (berman et al., 2000) and visualized with pymol (delano, 2002) . for u1 and u2 structural analysis, we used a cryo-em structure of the pre-catalytic human spliceosome (pdb id: 6qx9). for 7sl structural analysis, we used an x-ray crystal structure of the human signal recognition particle (pdb id: 1mfq). to examine human srp in the context of the ribosome, we used a cryo-em structure of the mammalian srp-ribosome complex (pdb id: 3jaj). to analyze the ribosomal es27 expansion segment, we superimposed a cryo-em structure of the expansion segment (pdb id: 6sxo) onto the complete ribosome structure (pdb id: 3jaj) using the pymol command "super." finally, for nsp1-18s rrna structural analysis, we used multiple structures of the ribosome, including structures of the pre-40s subunit (pdb id: 6g5h), 48s late-stage initiation complex (pdb id: 6yal), 80s in complex with serbp1 (pdb id: 6mte), and 80s in complex with stm1 (pdb id: 4v88). j o u r n a l p r e -p r o o f 37 nsp1 was cloned into a bacterial expression vector resulting in n-terminally tagged halo-6xhistagged nsp1. the nsp1 sequence was pcr amplified from addgene nsp1 entry vector to add a n-terminal 6x his tag and restriction enzyme sites for digestion and ligation into n-terminal halo bacterial expression vector. this construct was transformed into bl21 de3 e. coli (agilent), expanded to a 500ml liquid culture, and grown until od 600 reached 1.0. iptg was added to a final concentration of 1mm. after 3 hours of iptg induction, bacteria was centrifuged for 15 min at 5000× g. pellet was lysed with binding buffer (50mm hepes, ph 7.5, 20mm mgcl 2 , 600mm nacl, 2mm tcep, 10mm imidazole, 2mm atp, 1% triton x-100) supplemented with atp (2mm), protease inhibitor cocktail (promega), benzonase (sigma) and triton-x 100 (sigma) using 5ml of lysis mix per gram of wet cell paste. cell suspension was rocked for 20 min at room temperature and then centrifuged at 16,000× g for 20 min at 4°c. supernatant was incubated with washed imac resin (bio-rad) and rocked for 20 min at room temperature. we loaded the resin-lysate mixture into an appropriately-sized column and washed with 5 column volumes of binding buffer (50mm hepes, ph 7.5, 20mm mgcl 2 , 600mm nacl, 2mm tcep, 10mm imidazole, 2mm atp, 1% triton x-100) followed by 10 column volumes of wash buffer (50mm hepes, ph 7.5, 600mm nacl, 2mm tcep, 20mm imidazole, ph 8). recombinant nsp1 (rnsp1) was eluted with 5 column volumes of elution buffer by adding 1 column volume at a time with column flow stopped, collecting eluate after each addition, and waiting 15 min between each elution buffer addition. we dialyzed these eluates with a 10ml spectra-por® float-a-lyzer® g2 (spectrum laboratories) into storage buffer (50mm hepes, ph 7.5, 150mm nacl, 10% glycerol) at 4°c using 2 exchanges, one after 2 hours and then overnight. pierce 1-step human coupled ivt-dna (thermo fisher scientific) in vitro translation kit was used to measure rnsp1-dependent translation inhibition. bovine serum albumin (bsa), and buffer only controls were used to control for the addition of excess protein or changes in buffer composition. to measure translation inhibition, 5µl in vitro translation reactions were assembled, scaled according to manufacturer's recommendations. the included control plasmid pcfe-gfp was used to measure translational output of the reactions. gfp fluorescence was measured on a biotek cytation3 plate reader using emission filters for gfp fluorescence. 1.5µm j o u r n a l p r e -p r o o f 38 stock dilutions of rnsp1 and bsa were made in storage buffer (50mm hepes, ph 7.5.,150mm nacl,10% glycerol). subsequent 10 fold dilutions were made in storage buffer to span a concentration range of 1000 nm to 1 nm for each protein in the final reaction. 10 µl of the diluted protein solution was added to the 5µl translation reactions, and incubated for 5 minutes at room temperature prior to the addition of the gfp reporter plasmid. duplicate reactions were made to measure variability for each condition. in addition, a buffer only control was included to measure the effect of dilution of the translation reaction by the storage buffer. after the 5 minute incubation, 50 ng of gfp reporter plasmid was added to each reaction and incubated at 30°c for 4 hours prior to fluorescence detection. two microliters from each reaction was measured in duplicate on a biotek cytation3 microplate reader using excitation and emission filters for gfp. sample readings were blanked by subtracting values obtained from the buffer only control. promega's rabbit reticulocyte lysate system was also used to assay translation inhibition. to measure translation inhibition, 10µl in vitro translation reactions were assembled, scaled according to manufacturer's recommendations. for each translation reaction, either 10µl of recombinant protein storage buffer or rnsp1 was added, followed by 500ng of mrna. after 4 hours of incubation at 30°c, luciferase was read out using the bright-glo luciferase assay (promega) or gfp fluorescence was measured, both on a biotek cytation3 plate reader. we assayed translation in hek293t cells transfected with mammalian expression vectors, mrnas, or combinations of these. for mrna transfections of fluorescence protein translation reporters (including unmodified, +sars-cov2 leader sequence, +sl1, +sl2-sl1, and +5nts), dna templates for in vitro transcription were generated with sequences appended to the 5' end of gfp and mcherry (see tables s4 and s5 for primers and templates, respectively) and transcribed using hiscribe™ t7 arca mrna kit with tailing (new england biolabs). for nsp1 mrna transfection, indicated primers from table s4 were used to add restriction enzyme sites for cloning into pt7cfe1-chis backbone provided in the pierce human 1-step coupled acquistion files were analyzed with flowjo analysis software. to assay global protein translation, a sunset assay was performed as previously described (schmidt et al., 2009) we transfected these mammalian expression vectors for nsp1 and gfp into hek293t using biot transfection reagent. after 3 hours, doxycycline (sigma) was added to a final concentration of 2µg/ml. after 24 hours, cells were incubated with puromycin (10µg/ml) for 10 min, then washed with fresh media, and harvested with cold pbs. pelleted cells were lysed for 10 min on j o u r n a l p r e -p r o o f 40 ice (mixing after 5 min) with 100ul ripa buffer supplemented with protease inhibitor cocktail (promega). insoluble debris was pelleted by centrifuging at 12,500 × g for 2.5 minutes and supernatant was run on a bolt™ 4-12% bis-tris plus gel (thermo fisher scientific). proteins were then transferred to nitrocellulose using the iblot transfer system (thermo fisher scientific) and western blotting carried out using an anti-puro antibody (clone 12d10, emd millipore). sunset in sars-cov-2 infection was performed as above with the following modifications. cells were infected or not (mock) with sars-cov-2, and 48 hpi cells were incubated with puromycin (10µg/ml) for 20 min. media was aspirated and cells lysed directly in 2x laemmli's buffer (biorad), heated at 95ºc for ten minutes and run on a 4-12% nupage gel (thermo fisher scientific). proteins were transferred to nitrocellulose using the iblot transfer system and probed as above. to assay srp-dependent membrane protein transport to the cell surface, we monitored surface arrival of exogenously expressed neuronal growth factor receptor (ngfr) by flow cytometry in the presence of nsps. mammalian expression vectors were exchanged for versions that contained an ires-ngfr to co-express a membrane reporter and thus, for these experiments, lr reactions were carried out with destination vector pb-6xhis-gfp-dest-ires-ngfr. resulting expression vectors drive protein expression by a dox-inducible promoter, contain the rtta needed for dox induction, and produce an n-terminally-tagged his-gfp fusion protein and a co-expressed ngfr. the gfp here is an enhanced gfp containing an amino acid substitution (a205k) to generate a monomeric variant based on previous literature (alberti et al., 2018) . we transfected these mammalian expression vectors for nsp8, nsp9, nsp1∆rc mutant and to knockdown srp19 and srp54, sirnas targeting each (dharmacon cat# l-019729-01-0005 and l-005122-01-0005, respectively) were transfected into hek293t cells using lipofectamine rnaimax (invitrogen) according to manufacturer's protocols. to validate knockdown, transfected cells were assayed by qpcr using primer sets (table s2 ) to amplify each target as well as normalizer calm3. transfections were carried out 48 hours prior to assaying cells, either by qpcr, membrane reporter, or membrane sunset (see below) experiments. calu3 and vero cells were transfected with mrnas encoding leader-ngfr and leader-gfp using transit-mrna transfection kit (mirus) and subsequently infected with sars-cov-2 at an moi of 0.1. after 24 hours, cells were washed with pbs, trypsinized and fixed in 4% pfa for 20 minutes before staining with biotinylated anti-ngfr (biolegend) and anti-sars-cov-2 spike antibody (sino) and subsequently stained with pe-labeled anti-rabbit (thermo, p-2771mp) and pacblue-labeled streptavidin (thermo, s1222). facs was performed on a macsquant flow cytometer and analyzed using flojo analysis software; facs distributions were compared using a 2-tailed kolmogorov-smirnov test. for these experiments, rna was transcribed from a pcr template (see table s4 ) using the hiscribe t7 arca mrna kit (with tailing). to assay transport to the cell surface of all plasma membrane proteins, the sunset assay was adapted to puro-label surface proteins as previously described (schmidt et al., 2009) , and read out by flow cytometry. briefly, cells were incubated with puromycin as described above, followed by two quick washes and a chase with fresh complete media for 50 min. cells were lifted with 1mm edta as described above and stained with an anti-puro antibody (clone 12d10, emd millipore) conjugated to alexa-647. for these experiments, nsp was expressed from the same vector described above for membrane reporter assays. fluorescence intensity j o u r n a l p r e -p r o o f 42 measurements were taken for gfp and alexa-647 on a macsquant flow cytometer and analyzed using flojo analysis software; distributions were compared using a 2-tailed kolmogov-smirnov. to assess splicing efficiency, exons 5-6 of mouse irf7 (enmust00000026571.10) containing its endogenous intron were fused upstream of 2a self-cleaving peptide and egfp and cloned into an mscv vector (pig, addgene) (mayr and bartel, 2009 ). this construct was co-transfected into hek293ts with nsp16 or gfp and measured 24 hours after transfection by flow cytometry (macsquant) and analyzed using flojo analysis software. sars-cov2 or mock infected calu3 cells and nsp16-or gapdh-expressing hek293ts were labeled with 5-ethynyl-uridine (5eu; jena bioscience) by adding 5eu containing media to cells for 20 min at a final concentration of 1mm, as previously described (jao and salic, 2008) . after the pulse label, cells were washed with warm pbs and lysed in rlt buffer (qiagen). total rna was isolated from cells using manufacturer's protocols for qiashredder and rneasy rna isolation (both qiagen), followed by turbo dnase treatment (ambion, thermo scientific), and zymo rna clean and concentrate. for each sample, 2µg of rna was used for ligation of a unique barcoded rna adaptor, following the relevant steps in the protocol described above in library construction of rna-seq libraries. samples were then pooled before proceeding to biotinylation steps. to biotinylate 5eu-labeled rna, samples were first mixed, in order, with water, hepes (100 mm), biotin picolyl azide (1 mm; click chemistry tools) and ribolock rnase inhibitor, then added to premixed cuso 4 (2 mm) and thpta (10mm), and finally added to freshly prepared j o u r n a l p r e -p r o o f 43 sodium ascorbate (12mm), as previously described (hong et al., 2009) . the click reaction was incubated for 1 hour at 25ºc with 1000rpm shaking on an eppendorf thermomixer followed by rna purification using >17nt protocol for zymo clean and concentrate. we completed three rounds of sequential capture on streptavidin beads to isolate nascent transcripts (see figure s3b ). to capture biotinylated rna, myone streptavidin c1 dynabeads (thermofisher scientific) were first washed three times in urea buffer (10mm hepes, ph 7.5, 10mm edta, 0.5m licl, 0.5% triton x-100, 0.2% sds, 0.1% sodium deoxycholate, 2.5mm tcep, 4m urea) followed by three additional washes in m2 buffer (20mm tris, ph 7.5, 50mm nacl, 0.2% triton x-100, 0.2% sodium deoxycholate, 0.2% np-40). washed beads were mixed with 3 parts 4m urea buffer and 1 part biotinylated rna and incubated for 60 min with 900rpm thermomixer shaking at room temperature. after magnetic separation, beads were washed 3 times with m2 buffer followed by 3 washes with urea buffer at 37 ºc at 750rpm for 5 min. rna was eluted from beads in 2 sequential elutions by incubating with elution buffer (5.7m guanidine thiocyanate , 1% n-lauroylsarcosine; both sigma) at 65 ºc for 2 minutes, repeating with more elution buffer for a second elution. the elutions were pooled, diluted with urea buffer, incubated with pre-washed streptavidin beads, washed, and eluted for 2 additional rounds exactly as described above for a total of 3 sequential captures. final elutions were pooled, cleaned with zymo rna clean and concentrate following manufacturer's protocols, and carried through rna-seq library preparation as described above starting with the reverse transcription step. hek-blue™ isg cells were seeded in 96 well plates, transfected with nsp1 mammalian expression vectors using biot and stimulated with 50 ng/ml human ifn-b (r&d systems). supernatants were assayed for alkaline phosphatase as per manufacturer instructions using j o u r n a l p r e -p r o o f 44 quanti-blue reagent (invivogen). hek-293t cells were seeded in 6 well plates, transfected with either halo-tagged gapdh, nsp1, nsp8 and nsp9 in combination, or nsp16 mammalian expression vectors using biot. 24 hours later, the media was replaced with media containing 50 ng/ml human ifn-β (r&d systems). expression was assayed using live cell halo-imaging. halo-tmr ligand was diluted 1:200 in media and added to the culture for a 1:1000 final dilution. samples were incubated 30 minutes at 37°c, 5% co 2 and then the media was aspirated. wells were rinsed twice with pbs, then media was added back to the wells. samples were incubated 30 minutes at 37°c, 5% co 2 to allow uncoupled ligand to diffuse out of the cells. media was then aspirated and replaced, and cells were imaged by widefield fluorescence microscopy. cultures were ultimately harvested for rna 24 hours later, or 48 hours post transfection. a549s were seeded in 6 well plates, transfected with nsp1 mammalian expression vectors using lipofectamine 2000 and stimulated with 1 µg/ml hmw poly(i:c) (invivogen) 24h after transfection. supernatant was assayed for secreted ifn-β by elisa (human ifn beta elisa, high sensitivity, pbl) 24 hours after stimulation, and rna from cells was purified and assessed for isg gene expression as normalized to gapdh expression (sybr green master mix, bio-rad). primers used for qpcr are listed in table s2 . sars-cov-2 leader sequence was appended to the 5' end of gfp and mcherry reporter templates via pcr. pcr templates were then transcribed using hiscribe t7 arca mrna kit (with tailing). leader mutants, including sl1 only, sl1/sl2 swap, and +5nts mutants were likewise appended to the 5' end of fluorescent reporter templates via pcr and transcribed using hiscribe t7 arca kit. mrna reporters were transfected in hek-293t cells with lipofectamine messengermax. to measure fluorescence of mcherry and gfp reporters, 24 hours post transfection cells were either lifted with pbs and transferred into black 96 well plates for fluorescence readout on a biotek cytation 3 or trypsinized and processed for flow cytometry. sequence alignment and analysis. for halo purifications and rna binding mapping sequencing reads were aligned to a combined genome reference containing the sequences of structural rnas (ribosomal rnas, snrnas, snornas, 45s pre-rrna) and annotated mrnas (refseq hg38) using bowtie2. to distinguish between the nascent pre-ribosomal rna and mature 18s, 28s, and 5.8s rrna, we separated each of the components of the 45s into separate sequence units for alignment (e.g. its, ets). we excluded all low quality alignments (mapq < 2) from the analysis. for mrna analysis, we removed pcr duplicates using the picard markduplicates function (https://broadinstitute.github.io/picard/). for each rna, we enumerated 100 nucleotide windows across the entire rna. for each window, we calculated the enrichment by computing the number of reads overlapping the window in the protein elution sample divided by the total number of reads within the protein elution sample. we normalized this ratio by the number of reads in the input sample divided by the total number of reads in the input sample. because all windows overlapping a gene should have the same expression level in the input sample (which represents rna expression), we estimated the number of reads in the input as the maximum of either (i) the number of reads over the window or (ii) the median read count over all windows within the gene. this approach provides a conservative estimation of enrichment because it prevents windows from being scored as enriched if the input values over a given window are artificially low, while at the same time accounting for any non-random issues that lead to increases in read counts over a given window (e.g. fragmentation biases or alignment artifacts leading to non-random assignment or pileups). j o u r n a l p r e -p r o o f 46 we calculated a multiple testing corrected p-value using a scan statistic, as previously described (guttman et al., 2009 (guttman et al., , 2010 . briefly, n was defined as the number of reads in the protein elution plus the number of reads in the control sample. p was defined as the total number of reads in the protein elution sample divided by the sum of the protein elution sample total reads and total reads in the control sample. w was the size of the window used for the analysis (100 nucleotides). the scan statistic p-value was defined using the poisson estimations based on standard distributions previously described (naus, 1982) . because rna within input samples are fragmented differently than the protein elution samples, we noticed that the overall positional distribution of protein elution samples was distinct from input distributions. accordingly, we used the remaining protein elution samples (rather than input) as controls for each protein. specifically, this enabled us to test whether a given protein is enriched within a given window relative to all other viral and control proteins. enrichments were computed as described above. these values are plotted in figure 1 and table 1 . igv (robinson et al., 2011) and were generated by either: (i) computing the enrichment for each nucleotide as described above. in this case, the read count for each nucleotide was computed as the total number of reads that overlapped the nucleotide. (ii) counting the number of rt stop sites at a given nucleotide. in this case, we compute the alignment start position of the second in pair read and computed a count of each nucleotide. we normalized this count by the total number of reads in the sample to account for sequencing depth generated. we then normalized this ratio by the same ratio computed for the control sample (merge of all other protein samples) for each nucleotide. heatmaps were generated using morpheus (https://software.broadinstitute.org/morpheus/). all values were included if they contained a significant 100nt window with a p-value<0.001 (see above) and minimum enrichment of 3-fold above the control sample. gene ontology analysis. the 66 non-n enriched mrnas were analyzed against the gene ontology biological processes and reactome gene sets using the molecular signatures database (msigdb) (liberzon et al., 2015) . significantly enriched gene sets with an fdr<0.05 were used. j o u r n a l p r e -p r o o f 47 to ensure that significant gene sets were not being driven by the multiple ribosomal proteins or histone proteins, these analyses were also carried out excluding these proteins. sequenced reads were demultiplexed according to barcoded rna adaptor sequences ligated to each respective sample. trimmomatic (https://github.com/timflutre/trimmomatic ) was used to remove any contaminating illumina primer sequences in the reads and low quality reads. demultiplexed and trimmed files were then aligned to a hg19 reference genome using the spliceaware star aligner (https://github.com/alexdobin/star). alignments were then deduplicated for pcr duplicates using picard markduplicates (https://broadinstitute.github.io/picard/). aligned read-fragments were defined as read1 and read2 contained within a paired-end read fragment along with the insert between these two reads. we defined a set of high-quality represented isoforms per gene using the appris database (rodriguez et al., 2013) . all readfragments that spanned any 3' splice site within an isoform of one of these genes was retained. for each 3' splice site spanning fragment, we classified the read-fragment as a spliced fragment if it spanned an exon-exon junction (e.g. aligned entirely within 2 distinct exons) or an unspliced fragment if it spanned an intron-exon junction (e.g. one of the reads was contained -or partially contained -within the intron). for each isoform, we computed an unspliced ratio by counting the total number of reads that were classified as unspliced divided by the total number of readfragments spanning 3' splice sites within that gene. to ensure that the splicing ratio that we measured is a reliable metric and not inflated/deflated due to low read counts, we only included genes that contained at least 10 read-fragments in each sample and where the total number of reads in the control and sample conditions (when merged together) contained a significant number of reads to reliably measure a difference between the two groups as measured by a hypergeometric test (p<0.01). because different genes contain different baseline splicing ratios due to gene length and coverage, we computed a change in the splicing ratio for each gene independently. to do this, we subtracted the unspliced ratio for each sample from the average unspliced ratio for that gene j o u r n a l p r e -p r o o f 48 in all of the control samples. we plotted the overall distribution of these differences in splicing ratios as violin plots for each sample. if there is no change in splicing ratio, we would expect that some genes would have higher splicing ratios and others lower splicing ratios but that the overall distribution would be centered around 0. total rna-seq libraries were generated from the same mock infected and sars-cov-2 virally infected calu3 samples treated with 5eu. prior to 5eu purification, total rna was taken and an rna-seq library constructed as described above using barcoded rna adapters. cytoplasmic ribosomal rnas (18s and 28s) were depleted using nebnext ribosomal rna depletion kit (neb e6310l) per manufacturers recommendations. demultiplexed reads were aligned using bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) to custom genomes encoding classical noncoding rnas (ncrnas) or human messenger rnas (mrnas). expression levels were computed for each mrna by counting the total number of sequencing reads aligned to the mature mrna. to normalize across the different libraries, we computed the read counts for each sample that align to non-spliced structural non-coding rnas -excluding rrna but including snrnas, 7sl, 7sk, etc. we then divided each mrna count by the sum of all ncrna counts. this normalized value for each gene per sample was then converted into a fold-change by dividing this normalized value to the mean value for both mock infected samples. the fold change of each gene relative to mock was plotted across all mrnas as a violin plot. and enrichment >3-fold for any of the viral proteins is reported. • nsp16 binds mrna recognition domains of u1/u2 snrnas and disrupts mrna splicing • nsp1 binds in the mrna entry channel of the ribosome to disrupt protein translation • nsp8 and nsp9 bind the signal recognition particle and disrupt protein trafficking • these disruptions to protein production suppress the interferon response to infection in brief -sars-cov-2 proteins directly engage host rnas and dysregulate rna-based processes to suppress the interferon response j o u r n a l p r e -p r o o f ); office of the vice president for research at uvm m.g. is a nyscf-robertson investigator signal recognition particle: an essential protein-targeting machine a user's guide for phase separation assays with purified proteins recognition of doublestranded rna and activation of nf-κb by toll-like receptor 3 visualizing late states of human 40s ribosomal subunit maturation the proximal origin of sars-cov-2 sars-cov-2 (covid-19) by the numbers innate immune evasion strategies of dna and rna viruses the structure of the eukaryotic ribosome at 3.0 å resolution the structure of the eukaryotic ribosome at 3.0 å resolution the protein data bank mutagenesis of the murine hepatitis virus nsp1-coding region identifies residues important for protein processing, viral rna synthesis, and viral replication structures of translationally inactive mammalian ribosomes. elife 7 mers-cov 4b protein interferes with the nf-κb-dependent innate immune response during infection mechanism of 5' splice site transfer for human spliceosome activation genomics functional analysis and drug screening of sars-cov-2 rig-i and other rna sensors in antiviral immunity a new group of conserved coactivators that increase the specificity of ap-1 transcription factors cop9 signalosome: a multifunctional regulator of scf and other cullin-based ubiquitin ligases crystal structure and functional analysis of the sars-coronavirus rna cap 2′-o-methyltransferase nsp10/nsp16 complex pymol molecular graphics system the coding capacity of sars-cov-2 a common strategy for host rna degradation by divergent viruses minireview recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission chromatin signature reveals over a thousand highly conserved large non-coding rnas in mammals ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas impaired type i interferon activity and inflammatory responses in severe covid-19 patients mhc class i antigen presentation: learning from viral evasion strategies small molecules targeting viral rna dynamic switch of the signal recognition particle from scanning to targeting analysis and optimization of coppercatalyzed azide-alkyne cycloaddition for bioconjugation notch1 regulates maturation of cd4+ and cd8+ thymocytes by modulating tcr signal strength notch1 regulates maturation of cd4+ and cd8+ thymocytes by modulating tcr signal strength exploring rna transcription and turnover in vivo by using click chemistry sensing of rna viruses: a review of innate immune receptors involved in recognizing rna virus invasion a twopronged strategy to suppress host protein synthesis by sars coronavirus nsp1 protein differential roles of mda5 and rig-i helicases in the recognition of rna viruses poliovirus 2apro increases viral mrna and polysome stability coordinately in time with cleavage of eif4g a flexible genome-scale resource of sars-cov-2 coding sequence clones the architecture of sars-cov-2 transcriptome inducible transgene expression in human ips cells using versatile all-in-one piggybac transposons multiple anti-interferon actions of the influenza a virus ns1 protein iclip reveals the function of hnrnp particles in splicing at individual nucleotide resolution induced structural changes of 7sl rna during the assembly of human signal recognition particle quality and quantity control of gene expression by nonsense-mediated mrna decay cellular response to small molecules that selectively stall protein synthesis by the ribosome the molecular signatures database hallmark gene set collection translational control by viral proteinases middle east respiratory syndrome coronavirus nsp1 inhibits host gene expression by selectively targeting mrnas transcribed in the nucleus while sparing mrnas of cytoplasmic origin type i interferon susceptibility distinguishes sars-cov-2 from sars-cov halotag: a novel protein labeling technology for cell imaging and protein analysis down-regulation of jab1, hif-1α, and vegf by moloney murine leukemia virus-ts1 infection: a possible cause of neurodegeneration extensive coronavirus-induced membrane rearrangements are not a determinant of pathogenicity programmed delayed splicing: a mechanism for timed inflammatory gene expression widespread shortening of 3′utrs by alternative cleavage and polyadenylation activates oncogenes in cancer cells human cytomegalovirus inhibits major histocompatibility complex class ii expression by disruption of the jak/stat pathway inhibition of stress granule formation by middle east respiratory syndrome coronavirus 4a accessory protein facilitates viral translation, leading to efficient virus replication severe acute respiratory syndrome coronavirus nsp1 suppresses host gene expression, including that of type i interferon, in infected cells coronavirus nonstructural protein 1: common and distinct functions in the regulation of host and viral gene expression approximations for distributions of scan statistics robust transcriptome-wide discovery of rna-binding protein binding sites with enhanced clip (eclip) recognition of cognate transfer rna by the 30s ribosomal subunit jab1 mediates cytoplasmic localization and degradation of west nile virus capsid protein the expansion segments of 28s ribosomal rna extensively match human messenger rnas integrative genomics viewer appris: annotation of principal and alternative splice isoforms sunset, a nonradioactive method to monitor protein synthesis a u1 snrna:pre-mrna base pairing interaction is required early in yeast spliceosome assembly but does not uniquely define the 5' cleavage site role of nonstructural proteins in the pathogenesis of sars-cov-2 structural insights into the mammalian late-stage initiation complexes rna-rna and rna-protein interactions in coronavirus replication and transcription type i interferons in host defense the nsp9 replicase protein of sars-coronavirus severe acute respiratory syndrome coronavirus nsp1 facilitates efficient propagation in cells through a specific translational shutoff of host mrna the hepatitis b virus x protein enhances ap-1 activation through interaction with jab1 structural basis for translational shutdown and immune evasion by the nsp1 protein of sars-cov-2 signal peptide-binding drug as a selective inhibitor of co-translational protein translocation structures of the scanning and engaged states of the mammalian srp-ribosome complex the genetic sequence, origin, and diagnosis of sars-cov-2 comparative protein structure modeling using modeller the jnk signal transduction pathway metap-like ebp1 occupies the human ribosomal tunnel exit and recruits flexible rrna expansion segments recognition of viruses by cytoplasmic sensors improved protein structure prediction using predicted interresidue orientations influenza virus ns1 protein-rna interactome reveals intron targeting viral and host factors related to the clinical outcome of covid-19 structural basis for interaction of a cotranslational chaperone with the eukaryotic ribosome interferon-α2b treatment for covid-19 sars-cov-2 viral load in upper respiratory specimens of infected patients we thank fritz roth for clones; marko jovanovic, bil clemons, shu-ou shan, and jamie