key: cord-319447-xanewi59 authors: Sun, Jiya; Ye, Fei; Wu, Aiping; Yang, Ren; Pan, Mei; Sheng, Jie; Zhu, Wenjie; Mao, Longfei; Wang, Ming; Huang, Baoying; Tan, Wenjie; Jiang, Taijiao title: Comparative transcriptome analysis reveals the intensive early-stage responses of host cells to SARS-CoV-2 infection date: 2020-05-01 journal: bioRxiv DOI: 10.1101/2020.04.30.071274 sha: doc_id: 319447 cord_uid: xanewi59 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a widespread outbreak of highly pathogenic COVID-19. It is therefore important and timely to characterize interactions between the virus and host cell at the molecular level to understand its disease pathogenesis. To gain insights, we performed high-throughput sequencing that generated time-series data simultaneously for bioinformatics analysis of virus genomes and host transcriptomes implicated in SARS-CoV-2 infection. Our analysis results showed that the rapid growth of the virus was accompanied by an early intensive response of host genes. We also systematically compared the molecular footprints of the host cells in response to SARS-CoV-2, SARS-CoV and MERS-CoV. Upon infection, SARS-CoV-2 induced hundreds of up-regulated host genes hallmarked by a significant cytokine production followed by virus-specific host antiviral responses. While the cytokine and antiviral responses triggered by SARS-CoV and MERS-CoV were only observed during the late stage of infection, the host antiviral responses during the SARS-CoV-2 infection were gradually enhanced lagging behind the production of cytokine. The early rapid host responses were potentially attributed to the high efficiency of SARS-CoV-2 entry into host cells, underscored by evidence of a remarkably up-regulated gene expression of TPRMSS2 soon after infection. Taken together, our findings provide novel molecular insights into the mechanisms underlying the infectivity and pathogenicity of SARS-CoV-2. Coronavirus disease 2019 (COVID-19) triggered by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is currently affecting global health. The SARS-CoV-2 is the third highly pathogenic coronavirus following SARS-CoV and MERS-CoV that cause severe accurate respiratory symptoms in humans. Since December 2019, this virus has caused more than 80 thousand COVID-19 cases in China. Nowadays, the number of infections in countries outside China is growing rapidly. The most remarkable feature of the SARS-CoV-2 incidences and epidemiology is its great capacity for human-to-human transmission [1] . Clinically, the majority of COVID-19 patients have mild and moderate symptoms, and the elderly appear to have severe symptoms [2] . Based on the analysis of China data, the COVID-19 case-fatality rate was estimated as around 4.0% (3,341deaths over 82,249 confirmed cases of SARS-CoV-2 infection) [3] , lower than those of SARS and MERS [4] . However, due to the large-scale infected population, the SARS-CoV-2 has already caused more than ninety thousand deaths as of April 11 th 2020, sowing great social panic around the world. While recent efforts have been focused on transcriptome analysis of host responses to SARS-CoV-2 infection at a certain time point in certain cell lines [5, 6] , the transcriptional dynamics of host responses to the virus infection has remained largely unexplored. Generally, once the virus enters the cell, the host innate immune responses, such as the interferon-mediated antiviral responses and cytokine production, have a pivotal role in suppressing the virus replication, which, if inadequate, might contribute to the viral pathogenesis. This hypothesis has been supported by our previous study, which has shown that the high pathogenicity of avian influenza virus is associated with abnormal coordination between interferon-mediated antiviral responses and cytokine production in host cells [7] . Similar to both SARS-CoV and MERS-CoV, which induce the overactivation of cytokines [8] , increased cytokine levels are also observed in patients hospitalized with COVID-19 [9] . Transcriptome analysis of in vitro host cells shows that SARS-CoV and MERS-CoV elicit distinct responses to the expression of the host genes [10] . Until now, the time-series gene-expression profiling of the host response to SARS-CoV-2 remains unknown and thus is urgently needed uncovering its pathogenesis. In this study, we used the SARS-CoV-2 strain isolated from patients [11] to infect in vitro Calu-3 cells, and performed RNA sequencing to determine the time-series transcriptome profiling data of the host. We established the host response patterns for SARS-CoV-2 by comprehensive analysis of the transcriptomic profiles from SARS-CoV-2, SARS-CoV and MERS-CoV. These results provide profound new insights into the pathogenesis and progression of the COVID-19 disease caused by SARS-CoV-2, illuminating new strategies for the prevention and control of SARS-CoV-2 transmission and eventually leading to a cure of the COVID-19 disease. Cells and virus. Calu-3 human airway epithelial cells(ATCC, HTB-55) were cultured in minimum essential media (MEM, HyClone) supplemented with 10% fetal bovine serum(FBS), 1% MEM NEAA, and 100 U/mL penicillin-streptomycin solution (Gibco, Grand Island, USA) at 37 °C in a humidified atmosphere of 5% (v/v) CO2. Vero cells (ATCC, CCL-81) were cultured at 37 °C in Dulbecco's modified Eagle's medium (DMEM, Gibco) supplemented with 10% FBS (Gibco) in the atmosphere with 5% CO2. SARS-CoV-2 strain BetaCoV/Wuhan/IVDC-HB-01/2019 (C-Tan-HB01, GISAID accession no. EPI_ISL_402119) was isolated from a human patient [11] . Viruses were harvested and viral titrations were performed in Vero cells using plaque assay. [12] . The cleaned data were mapped to the human GRCh38 reference genome using STAR aligner (v2.7.2a) [13] . The htseq-count command was used to count reads mapped to each gene [14] . The R package DESeq2 was applied to further identify Differentially Expressed Genes (DEGs) (FDR<0.05, |log2FC|>=1) [15] . The unmapped reads against the entire human genome were further aligned to the reference genome of SARS-CoV-2 (EPI_ISL_402119). Virus genome annotation was based on our previous work [16] . GO enrichment analysis was performed by Fisher's exact test with the 19932 human protein-coding genes as a background in R. For analysis of microarray data of SARS-CoV and MERS-CoV, normalization and identification of DEGs (FDR<0.05, |log2FC|>=1) were conducted using the R package limma [17] . We carried out in time-course experiments to identify dynamic changes in transcripts in response to SARS-CoV-2 based on the infected and mock-infected groups across four time points (0, 7, 12 and 24 hpi), in which three biologically independent replicates for each treatment group were used for constructing cDNA libraries. The Calu-3 human airway epithelial cell line, a model of human respiratory disease [16] , was used as the host cell of SARS-CoV-2, subjected to the same MOI and host cell used in the previous analyses of SARS-CoV and MERS-CoV infections. After the total RNA isolation and sequencing, we obtained the host transcriptomes, as well as the genomes and transcripts of viruses. The high-throughput sequencing resulted in an average of 49 million pairedend reads per sample, and the sequencing quality was high with a mean mapping rate of unique reads at approximately 72% among mock samples (Supplementary Table 1 ). The quality control of all samples was assessed by the PCA analysis based on normalized counts from DESeq2, which indicated that high quality was achieved given that the majority of samples were well clustered except only one sample from the infection group at 24 hpi that was removed before further analysis ( Figure 1A ). To evaluate the growth rate of SARS-CoV-2, we calculated the RNA level of the virus represented by unique reads mapping rates at different time points. Our results showed that in general the virus reads increased sharply from 1.4% to 61.2% while reads mapped to the host genome dropped rapidly from 67.2% to 11.4% ( Figure 1B) , suggesting a rapid replication of the virus within 24 hours. From the results, at the earliest time point (0 hpi), virus produced high-levels of viral genome RNA as evidenced by relatively even coverage depth across the whole genome ( Figure 1C ). Interestingly, we found that there was a significantly active transcription of the 3' end of SARS-CoV-2 at 7 hpi, especially for the M, 6, 7a, 7b, 8b and N genes ( Figure 1C) which could play important roles in the antagonism with host immune response [18, 19] . After that, the relatively even depth distribution of reads along viral genome was again observed at panels of 12 and 24 hpi. This time-dependent patterns of virus replication and transcription was most likely to play critical roles in the pathology of SARS-CoV-2. To elucidate the global changes of host gene expression along with virus growth, we identified the overall up-and down-regulated DEGs during SARS-CoV-2 infection ( Figure 1D and Supplementary Table 2 ). As shown in Figure 1D , during the early stage of infection before 7 hpi, there were many more up-regulated genes than down- To investigate specific host responses during SARS-CoV-2 infection, we performed a comparative transcriptome analysis by integrating two public host transcriptomes of SARS-CoV (GSE33267) [20] and MERS-CoV (GSE45042) [10] Figure 3B ). In addition, a list of the virus-specific antiviral-related genes was identified, including PARP10 [21] and CMPK2 [22] for SARS-CoV-2, BST2 [23] , ITITM1 and USP41 for SARS, and PARP4 [24] for MERS-CoV ( Figure 3B ). Secondly, we further used a set of 113 human cytokines to quantify host inflammation responses between three viruses. The 113 cytokines from the CytoReg database were often cited by various publications and play a primary role in the immune system [25] . Our results showed that, for SARS-CoV-2, the level of cytokine production was highly induced at 0 hpi, decreased at 7 hpi, and then slowly recovered thereafter ( Figure 3C ). and MERS-CoV. Our analysis also provided evidence that SARS-CoV-2 had more cytokines in common with SARS-CoV than with MERS-CoV ( Figure 3D Next, to gain possible explanations for the distinct patterns in host antiviral capacity and cytokine production during SARS-CoV-2 infection, dynamic expression of four types of key genes were evaluated, including virus receptors for cell entry, pathogen recognition receptors (PRRs) for an innate immune startup, regulator genes for induction of antiviral-related genes and interferon production (Figure 4) . For the three cell entry related genes (ACE2 as the receptor of SARS-CoV and SARS-CoV-2 [28, 29] , DPP4 as the receptor of MERS-CoV [30] and protease TMPRSS2 for S protein priming of SARS-CoV-2 [27] ), we observed the dramatic changes in TMPRSS2 expression with very early induction during SARS-CoV-2 infection, and the slightly down-regulated expression of ACE2 in cells infected with SARS-CoV-2 and SARS-CoV, whereas DPP4 was more up-regulated in MERS-CoV (Figure 4 ). For the two PRRs, DDX58 is a canonical RIG-I-like receptor for RNA virus recognition [31] , and TLR3 is a Toll-like receptor playing important roles in initiating a protective innate immune response to highly pathogenic coronavirus infections [32] . We observed that all three viruses had a notably up-regulated expression of DDX58 while only MERS-CoV had a suppressed TLR3 at the 24 hpi (Figure 4) , which is consistent with the fact that decreased expression of TLR3 contributes to the pathology of highly pathogenic coronavirus infections [32] . Among the four regulator genes, IRF7 is responsible for the expression of most IFN-α subtypes and the type I IFN amplification loop [33] , and IRF9, STAT1 and STAT2 form the ISGF3 complex that binds to interferon-stimulated response elements and thereby induces the expression of interferon-stimulated genes [34] . As expected, gradually up-regulation of the four primary regulator genes was observed for all three viruses (Figure 4 ). At last, we found a significant difference in the expression of IFNB1 between SARS-CoV-2, SARS-CoV and MERS-CoV, indicating that IFNB1 likely accounted for the observed variations of the host antiviral capacities among three viruses (Figure 4) . Taken together, early induction of TMPRSS2 and gradually increased expression level of IFNB1 were likely responsible for the distinct host immune response patterns of SARS-CoV-2 infection. Using time-series profiling of the virus genome and host transcriptome at the same time during SARS-CoV-2 infection coupled with comparative transcriptome analysis, we found that, compared to SARS-CoV and MERS-CoV, SARS-CoV-2 induces strong host cell responses at the very early stage of infection that not only favor its high infectivity to host cells but also restrict its pathogenesis. Here we sequenced the transcriptomes of SARS-CoV-2 and virus-infected host cells simultaneously during the early stages of infection, providing a robust reference dataset to speculate the antagonistic pattern between pathogen and host cells. To summarize, our findings showed that SARS-CoV-2 induced the significantly high expression of the cellular serine protease TMPRSS2 at 0 hpi to help the entry of viral particles into cells [28] (Figure 4 ). At the same time, host cell initiated an immediate response for the invasion of SARS-CoV-2 virus ( Figure 1D ). Then, the virus successfully suppressed the acute response of host cells for fast proliferation by increasing the transcripts of its 3' genome end, including M, 6, 7a, 7b, 8 and N genes which were consistent with their reported regulations to host immune response [18, 19] . As a response from hosts cell, a number of antiviral pathways and cytokine productions were up-regulated to resist the virus infection (Figures 2 and 3 ). In particular, several metabolism-associated pathways were down-regulated at 12hpi and 24 hpi (Figure 2 ). After the antagonistic cycle, a dramatic proliferation of viral particles was detected in the early infection of host cells ( Figure 1B) , which could possibly be an explanation for the fast spread of SARS-CoV-2 in humans. As SARS-CoV-2 was reported a relatively low risk of mortality [3] compared to the other two serious human coronaviruses, SARS-CoV and MERS-CoV, we compared and contrasted the host transcriptomes in response to the viral infections. We found that some cytokines in SARS-CoV-2-infected cells were markedly up-regulated at a very early stage, which was not observed for SARS-CoV and MERS-CoV and even less frequently observed for other viruses. The unusual high expression of cytokines at 0 hpi possibly explains why patients with severe clinical symptoms rapidly deteriorated. Although the number of infected cases was very high, the majority of infections displayed mild symptoms which are partly explained by a gradual increase in host antiviral capability from 7 to 24 hpi. In contrast to SARS-CoV-2, both SARS-CoV and MERS-CoV were able to inhibit the antiviral capability of the host significantly, which could explain their observed relatively high mortalities. MERS was associated with a higher mortality than SARS, which could be in part attributed to the higher expression of cytokines suppressing the antiviral responses. Recently, Blanco-Melo et al. [5] have published transcriptome data of host responses to SARS-CoV-2 from in vitro cell lines including A549 (MOI of 0.2) and NHBE (MOI of 2) at 24 hpi. This previously published data is complemented by our study designed to investigate the early response phase of cell lines infected with SARS-CoV-2. While the previous work did not observe the elevated levels of IFNB1, IFNL1 and IFNL3, our findings show that not only IFNB1 but also IFNL1 and IFNL3 expressions are upregulated between 7 and 24 hpi ( Figure 3D ). Also, they did not detect gene expression of ACE2 and TMPRSS2 at 24 hpi, while we observed that ACE2 is down-regulated at 24 hpi and TMPRSS2 is only up-regulated at 0 hpi before returning to the normal levels ( Figure 4) . Our time-series sampling revealed distinct early-response features of SARS-CoV-2, which provided a possible explanation for some clinical observations. For example, a recent clinical study [35] found that SARS-CoV-2 could replicate effectively in upper respiratory tract tissues, and that the viral loads appeared earlier (before day 5) and were substantially more than expected. Findings from the present study have confirmed that, at 7 hpi, the 3' end of SARS-CoV-2 genome start to express densely, reducing the effectiveness of host immune surveillance, which possibly enables the rapid replication of SARS-CoV-2 in upper respiratory tract tissues. In spite of the fact that several studies have already demonstrated a consistent correlation between gene expression measured by RNA-Seq and by microarray [36] [37] [38] , we still need to exclude the possibility of bias resulting from different methodologies. First, because RNA-Seq can potentially detect more genes than microarrays, we only Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Clinical Characteristics of Coronavirus Disease 2019 in China Estimates of the severity of coronavirus disease 2019: a model-based analysis From SARS to MERS, Thrusting Coronaviruses into the Spotlight. Viruses SARS-CoV-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients Regulation of Early Host Immune Responses Shapes the Pathogenicity of Avian Influenza A Virus Clinical features of patients infected with 2019 novel coronavirus in Wuhan Cell host response to infection with novel human coronavirus EMC predicts potential antivirals and important differences with SARS coronavirus. mBio A Novel Coronavirus from Patients with Pneumonia in China Trimmomatic: a flexible trimmer for Illumina sequence data STAR: ultrafast universal RNA-seq aligner HTSeq--a Python framework to work with highthroughput sequencing data Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China limma powers differential expression analyses for RNA-sequencing and microarray studies Human Coronaviruses: A Review of Virus-Host Interactions. Diseases Human Coronavirus: Host-Pathogen Interaction Release of severe acute respiratory syndrome coronavirus nuclear import block enhances host transcription in human lung cells The interaction between the PARP10 protein and the NS1 protein of H5N1 AIV and its effect on virus replication CMPK2 and BCL-G are associated with type 1 interferon-induced HIV restriction in humans Tetherin inhibits HIV-1 release by directly tethering virions to cells Rapid evolution of PARP genes suggests a broad role for ADPribosylation in host-virus conflicts Global landscape of mouse and human cytokine transcriptional regulation Pathological findings of COVID-19 associated with acute respiratory distress syndrome Dysregulation of immune response in patients with COVID-19 in Wuhan, China SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC RIG-I in RNA virus recognition Toll-Like Receptor 3 Signaling via TRIF Contributes to a Protective Innate Immune Response to Severe Acute Respiratory Syndrome Coronavirus Infection. mBio IRF-3, IRF-5, and IRF-7 coordinately regulate the type I IFN response in myeloid dendritic cells downstream of MAVS signaling The molecular basis for functional plasticity in type I interferon signaling Virological assessment of hospitalized patients with COVID-2019 A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae Correlation between RNA-Seq and microarrays results using TCGA data Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data The sequencing data from this study have been submitted to the National Genomics Data Center (https://bigd.big.ac.cn/) with the accession number PRJCA002617. Jiang, Tan and Huang devised the experiment and wrote the paper; Sun, Wu, Sheng and Zhu conducted bioinformatics analysis; Ye, Huang and Yang prepared samples; Wang provided Calu-3 cell; Pan run RNA Sequencing; and all authors revised the manuscript. The authors declare that they have no conflict of interest.