key: cord-254291-y8xvh6hs authors: yamanaka, miles; crisp, tracey; brown, rhonda; dale, beverly title: nucleotide sequence of the inter-structural gene region of feline infectious peritonitis virus date: 1998 journal: virus genes doi: 10.1023/a:1008099209942 sha: doc_id: 254291 cord_uid: y8xvh6hs the sequence of the region located between the s and m glycoprotein genes of the 79-1146 strain of feline infectious peritonitis virus (fipv) is presented. the inter-structural gene region encodes 3 open reading frames (orfs), termed orfs 3a, 3b and 4, with nucleotide sequences conforming to the minimum conserved transcription signal upstream of each. an additional orf, 3x, partially overlaps the 3′ end of orf 3a. the fipv interstructural gene region is identical in length when compared to the insavc-1 strain of canine coronavirus (ccv) but differs from various strains of transmissible gastroenteritis virus (tgev) by the presence of deletions and insertions. the sizes of orf 3a and 4 are conserved in fipv, tgev and ccv. however, as with ccv, the fipv orf 3b is truncated in comparison with tgev. feline infectious peritonitis is a disease characterized by immunopathology and caused by a coronavirus. in fipv-infected cells, 7 viral mrnas have been detected (1) . two of these originate from a region of the fipv genome lying between the genes encoding s and m. this inter-structural gene region has been examined in the related coronaviruses transmissible gastroenteritis virus (tgev) and canine coronavirus (ccv) (2±6). the arrangement of the open reading frames (orfs) in the inter-structural gene region has been described for the fipv genome (1,7), but detailed sequence has not been presented. here, we report on the sequence of this part of the fipv genome. we screened an fipv 79-1146 cdna library with oligonucleotide probes derived from the published s sequence (8) and isolated clones containing the interstructural gene region. the sequence of one of these cdnas (genbank accession number af033000) was analyzed in detail. the overall organization of the fipv interstructural gene region is similar to that of ccv and tgev. three orfs encoding polypeptides of 71, 40 and 82 residues are present in the fipv sequence; these have been designated orf 3a, 3b and 4, respectively, in ccv and tgev (orf 4 is also known as the small membrane gene because it encodes a polypeptide that is similar in sequence to an infectious bronchitis virus membrane protein). an additional orf of 71 residues partially overlaps the 3 h end of fipv orf 3a. this orf is present in ccv but absent from tgev, and has been called orf 3x (6) . upstream of the fipv orf 3a and orf 4 is the nucleotide sequence ctaaac that is the minimum conserved transcription signal found in other fipv, tgev and ccv genes (6, 9) . a related sequence, ctaaat, is present upstream of fipv orf 3b. for these orfs, the distance between the transcription signal and the start of translation is conserved in the 3 coronaviruses. the inter-structural gene regions of fipv and the insavc-1 strain of ccv are identical in size. the * corresponding author. sequence identity of the two regions is 89.7% at the nucleotide level. a 52 base deletion, located 59 bases upstream of orf 3a, and a 63 base insertion, starting 153 bases upstream of orf 3b, are present in the fipv interstructural gene region relative to the purdue strain of tgev. different strains of tgev also show variation in these same regions, emphasizing the sequencē exibility in this part of the coronavirus genome. the sequence identity between fipv 79-1146 and the purdue strain of tgev in the inter-structural gene region is 90.7%. the products of fipv orf 3a and orf 4 are identical in length to the corresponding polypeptides of ccv and tgev (purdue strain), with amino acid similarities of 91.5% and 81.7%, respectively, between fipv and ccv, and 90.1% and 75.6% between fipv and tgev. in contrast, an amber codon limits fipv orf 3b to only 40 residues while orf 3b of the purdue strain of tgev extends 244 amino acids (3) . although the fipv orf 3b is shorter, the region distal to the amber codon is similar in nucleotide sequence and identical in length to the remaining portion of the tgev orf 3b sequence. this is consistent with the idea that a base substitution has created a premature stop codon in the fipv orf 3b coding region. for ccv, orf 3b is also limited to 31 residues. differences in expression and primary sequence of orf 3b occur in various tgev strains, and orf 3b is truncated in fipv and ccv. this indicates that orf 3b is not absolutely required for virus growth. sequence determination of virus passaged in cats will help to answer questions about the requirement for orf 3b expression in the virus life cycle (7) . the coronaviridae we thank our colleagues lloyd chavez and bill acree at fort dodge laboratories for supplying the virus stock and for stimulating discussions. this work was supported by fort dodge laboratories and scios, inc. key: cord-288239-ca5uthvd authors: jeoung, hye-young; lim, ji-ae; jeong, wooseog; oem, jae-ku; an, dong-jun title: three clusters of bovine kobuvirus isolated in korea, 2008–2010 date: 2011-03-12 journal: virus genes doi: 10.1007/s11262-011-0593-9 sha: doc_id: 288239 cord_uid: ca5uthvd fecal samples (n = 107) were collected from cattle with ascertained or suspected diarrheal disease on korean farms during 2008–2010. of these, 37 samples tested positive for bovine kobuvirus. the 37 positive samples came from 32 cattle that exhibited diarrhea and five cattle that were non-diarrhetic. the majority of the virus-positive feces samples were from calves under 1 month of age (n = 25). nine of the 37 cattle infected with bovine kobuvirus were confirmed to have a co-infection with other viruses including bovine rotavirus (n = 3), bovine coronavirus (n = 1), bovine viral diarrhea virus (n = 1), and both bovine coronavirus and bovine viral diarrhea virus (n = 4). a neighbor-joining tree grouped 36 of the korean kobuvirus strains (with the exception of the kb8 strain) into three clusters (g1, g3, and g4), while strains derived from thailand and japan (except the u1 strain) were included in the g2 cluster. the results indicated that korean bovine kobuvirus has diverse lineages regardless of disease status and species. electronic supplementary material: the online version of this article (doi:10.1007/s11262-011-0593-9) contains supplementary material, which is available to authorized users. gyongbuk (n = 10), and gyongnam (n = 4). viral rna was extracted from feces using trizol ls b according to the manufacturer's instructions (invitrogen, carlsbad, ca, usa). bovine kobuvirus was detected from fecal samples using reverse-transcript-polymerase chain reaction (rt-pcr) as previously described [3] . oligonucleotide primers were designed based on the genome sequence of the u-1 strain (accession no. ab084788) and have the following sequences: u-1f (sense, 5 0 -catgctcctcggtggtct ca-3 0 ; nt 7,357) and u-1r (antisense, 5 0 -gtccgggtc catcacagggt-3 0 ; nt 7,987). together, these primers amplify a 631-bp region of the 3d protein. pcr products of size 631 bp were visualized by electrophoresis and were cloned using the pgem-t vector system ii (promega, madison, wi, usa). the cloned genes (three per sample) were sequenced, using t7 and sp6 promoter-specific primers, with an abi prism ò 3730xi dna sequencer (applied biosystems, foster city, ca, usa) at the macrogen institute (macrogen, seoul, korea). to investigate the relationship between kobuvirus and other bovine viruses that cause diarrhea in cattle, a screening test was conducted using primers specific for the detection of bovine rotavirus (brv) [9] , bovine coronavirus (bcv) [10] , and bovine viral diarrhea virus (bvdv) [11] , as previously described. reverse transcription for the extracted rna was performed using a cdna synthesis kit (takara) and random hexanucleotide primers. the rt-pcr was run according to the following temperaturetime profile: 42°c for 30 min, then 94°c for 5 min, followed by 35 cycles of virus-specific conditions, as follows: brv: 94°c for 30 s, 54°c for 1 min and 72°c for 1 min; bcv: 94°c for 1 min, 58°c for 1 min and 72°c for 2 min; and bvdv: 94°c for 1 min, 56°c for 1 min and 72°c for 1 min. for all viruses, the 35 denaturation-annealingextension cycles were followed by a final extension at 72°c for 10 min. the resulting amplicon sizes were 309 bp for brv, 730 bp for bcv, and 288 bp for bvdv. the sizes were assessed by 1% agarose gel electrophoresis and confirmed through the sequencing and analysis of the nucleotide sequence of each amplicon. the nucleotide sequences of the korean bovine kobuviruses were compared to those of kobuvirus reference strains in the genbank database by blast. the nucleotide sequences were aligned using the clustal w 1.8x program [12] and.aln files were generated. the.aln files were then converted to.meg files using mega 4 [13] and a neighbor-joining tree was constructed (bootstrap replicates = 1,000) using the kimura 2 parameter method for pairwise deletion at uniform rates. the nucleotide sequences of korean bovine kobuvirus strains were deposited in genbank as accession numbers hq650164-hq650200 (table 1) . in prior studies, bovine kobuvirus was detected in 12 of 72 (16.7%) stool samples in japan [3] , 6 of 72 (8.3%) fecal samples in thailand [5] , and 2 of 32 (6.25%) fecal samples in hungary [6] . korean bovine kobuvirus was markedly more prevalent, being detected in 37 of the 107 (34.6%) fecal samples. furthermore, the yearly frequency of the korean positive samples was constant: n = 12 in both 2008 and 2009, and n = 13 in 2010 (table 1) . thirty-two of the 86 diarrhea samples (37.2%) contained kobuvirus, compared with 5 out of 21 non-diarrhea samples (23.8%). however, this result cannot be taken as evidence of a causal relationship between kobuvirus infection and diarrhea, and such a causal relationship has been questioned in previous analyses [3, [5] [6] [7] . infection by kobuvirus occurred in 38.5% (30 of 78) of the korean native cattle and 24.1% (7 of 29) of the holstein cattle. this result indicated that kobuvirus infection is not restricted to a single cattle species. regarding the age of infected cattle, this study clearly showed a predominance of infection in calves under the age of 1 month (n = 25), a result similar to that of a previous study [5] . kobuvirus prevalence by geographic region was 45.2% (19/42) in chungnam, 30% (3/10) in gyongbuk, 28.1% (9/ 32) in gyonggi, 27.3% (3/11) in chungbuk, 25% (2/8) in gangwon, and 25% (1/4) in gyongnam. in spite of these seemingly substantial differences in prevalence, the low numbers of samples did not provide sufficient statistical power to allow any conclusions regarding geographic predilection. the geographic differences can therefore be considered tenuous at this point, requiring further study with larger sample sizes. the clinical significance of any such differences remains unclear. the combined infection involving bovine kobuvirus and other viruses was observed in nine cattles: brv (n = 3), bcv (n = 1), bvdv (n = 1), and bcv ? bvdv (n = 4). however, it is unclear whether the other viruses are directly associated with the kobuvirus infection. neighbor-joining analysis revealed that partial nucleotide sequences (590 bp in length) of the 3d genes of 56 bovine kobuvirus (37 from korea, 13 from japan, and 6 from thailand), along with that of the aichi virus (as the outgroup), fell into four main lineages (g1, g2, g3, and g4). with the exception of the u1 and kb8 strains, all of the sequences fell into one of these four lineages (fig. 1) . the four lineages were supported by high bootstrap values (75-99%) at the node of each branch. interestingly, the 36 korean kobuvirus strains formed three lineages (g1, g3, and g4), while the 12 japanese and six thailand strains all fell within the g2 lineage (fig. 1) . a future analysis using a larger number of strains may be required to confirm that the u1 and kb8 strains represent the first recognized strains of an additional cluster or two additional clusters. in conclusion, the findings of this study demonstrate the existence of four phylogenetic lineages of bovine kobuvirus. korean kobuvirus strains are found in three of the four lineages, with japanese and thailand strains being clustered together in the other lineage. virus taxonomy, 8th report of the ictv acknowledgment the authors are grateful to ms. bo-hye shin and ms. hyen-jung kim for their technical assistance. key: cord-303834-yqysedne authors: ducatez, mariette f.; liais, etienne; croville, guillaume; guérin, jean-luc title: full genome sequence of guinea fowl coronavirus associated with fulminating disease date: 2015-02-25 journal: virus genes doi: 10.1007/s11262-015-1183-z sha: doc_id: 303834 cord_uid: yqysedne guinea fowl coronavirus (gfcov), a recently characterized avian coronavirus, was identified from outbreaks of fulminating disease (peracute enteritis) in guinea fowl in france. the full-length genomic sequence was determined to better understand its genetic relationship with avian coronaviruses. the full-length coding genome sequence was 26,985 nucleotides long with 11 open reading frames and no hemagglutinin–esterase gene: a genome organization identical to that of turkey coronavirus [5′ untranslated region (utr)—replicase (orfs 1a, 1ab)—spike (s) protein—orf3 (orfs 3a, 3b)—small envelop (e or 3c) protein—membrane (m) protein—orf5 (orfs 4b, 4c, 5a, 5b)—nucleocapsid (n) protein (orfs n and 6b)—3′ utr]. this is the first complete genome sequence of a gfcov and confirms that the new virus belongs to group gammacoronaviruses. electronic supplementary material: the online version of this article (doi:10.1007/s11262-015-1183-z) contains supplementary material, which is available to authorized users. coronaviruses (covs) are enveloped viruses with positivesense, non-segmented rna genomes of 25-32 kb. covs infect a wide range of hosts causing various degrees of morbidity and mortality. group i covs (alphacoronaviruses) contain viruses that infect not only humans (hcov-229e and hcov-nl63) but also cats and dogs (with feline cov and canine cov, respectively), or pigs (with the porcine transmissible gastroenteritis virus, tgev for example). similarly, group ii covs (betacoronaviruses) may infect humans (examples: hcov-oc43, hcov-hku1, severe acute respiratory syndrome (sars)-related covs or the recently emerged mers-cov), horses (with ecov), or cattle (with bcov). in contrast, group iii covs (gammacoronaviruses) primarily infect birds: chickens, peafowl, and partridges harbour infectious bronchitis virus (ibv) while turkeys have turkey cov (tcov) and guinea fowl may be infected with guinea fowl cov (gfcov). gammacoronavirus strains have however been isolated from a whale and a wild felid [1] . group iv covs (deltacoronaviruses) have been detected in birds (with bucov, mucov, spcov, etc.), or pigs (with porcine deltacoronavirus) [2] . interestingly covs of the groups i, ii, and iv have been detected in chiroptera (bats), thought to be the reservoir of covs [3, 4] . in the present study, we focused on a new member of the group iii covs, gfcov, and aimed at sequencing its full genome to better understand its molecular relationship with gammacoronaviruses. to determine the full genome of gammacov/guinea fowl/ france/s/2011 (gfcov/fr/2011), we first analysed the data generated on a miseq illumina platform as previously described [5] . briefly, pooled intestinal contents of experimentally infected guinea poults were clarified, ultracentrifuged, and treated with nucleases to concentrate encapsidated viral material. rna was extracted, and a random rt-pcr was performed to generate unbiased pcr products of about 300 bp [5, 6] . the sequences generated that matched with avian covs sequences, as determined using gaas software [7] , were extracted for further analysis and visualized using integrative genomics viewer (igv) with the closest blast hit as reference genome: tcov mg10 (accession number: eu095850) [8] . primers were designed based on the known sequence data to amplify missing genome fragments by pcr. sanger sequencing was then performed with pcr primers. the full genome sequence was submitted to embl and was attributed the following accession number: [ln610099]. sequence analysis was carried out using bioedit version 7.0.8.0 [9] , muscle for the alignment [10] , and mega version 5.05 for the phylogeny [11] . the gfcov-generated sequences were assembled into one contiguous coding sequence of 26,985 nucleotides. the entire genome had a gc content of 38.3 %, identical to the turkey coronavirus (tcov) mg10 genome [12] . gfcov and tcov genomes have the same organization: (i) a 5 0 untranslated region (utr), (ii) two large slightly overlapping orfs coding for the replicase: 1a and 1ab, (iii) gene coding for the spike (s) protein, (iv) orf3 (orfs 3a, 3b), (v) gene coding for the small envelop (e or 3c) protein, (vi) gene coding for the membrane (m) protein, (vii) orf5 (4b and 4c, 5a, 5b), (viii) genes coding for the nucleocapsid (n) protein (orfs n and 6b), and (ix) 3 0 utr ( table 1) . the multiprotein on single orfs is generated by alternative translation. while the role of avian coronavirus (ibv) structural proteins is known: binding to rna, nucleocapsid formation and role in cell-mediated immunity for n; virus budding site determination, role in virus particle assembly and in interferon-induction, interaction with viral nucleocapsid for m; association with viral envelop, role in virus particle assembly and putatively in apoptosis for e; binding to cellular receptors, induction of fusion between viral and cellular membranes, induction of neutralizing antibodies and role in cell-mediated immunity for s; little is known on the function of non-structural proteins. it has mainly been shown that they are not essential for virus replication in vitro but likely help the virus replicate in vivo [13, 14] . the proteins 3a, 3b, 4b, 5a, and n were of the same size. sizes of other proteins varied, but within the range observed previously between different tcov strains. interestingly, gfcov/fr/2011 harboured a shorter small envelop protein than its tcov counterparts (table 1) . further studies are warranted to understand the impact of avian covs protein sizes in the biology of the viruses. phylogenetic analysis on the full genome of gfcov/fr/ 2011 showed it clearly clustered with north american tcov strains (fig. 1a , supported by a high bootstrap value of 100), as it was observed previously for the s gene [5] . the genetic distance between gfcov/fr/2011 and tcov ranged between 10.7 and 11.4 %, while genetic distances between gfcov/fr/2011 and representative ibv strains were larger and varied between 13.5 and 15.0 % (supplementary table) . a simplot analysis comparing the gfcov/ fr/2011 full genome to its closest tcov and ibv blast hits showed that the three genomes are highly similar throughout the genome (74-100 % similarity, with no significantly higher identity of gfcov/fr/2011 with tcov or ibv genomes), except for the s gene (fig. 1b) . gfcov s gene was indeed more closely related to tcov s than to ibv s genes but also more distinct to both viruses on the s gene than on the rest of its genome (\50 % identity for ibv and 65-90 % identity with tcov s genes, fig. 1b) , suggesting a recombination event as was hypothesized for the origin of tcov [15] . a parallel evolution from a common ancestor with a much higher substitution rate on the s gene than on the rest of the genome can however not be ruled out at this stage. the present study showed that gfcov/fr/2011 harbours a genome organization very similar to that of tcov strains. in addition, and again like tcov, gfcov/fr/2011 likely originated from a recombination event between an ibv-like (or tcov-like) virus that would have given most of its genome and a so far unknown cov that would have contributed by its spike gene. despite the similarity of their genomes and their enteric tropism, tcovs often cause mild clinical signs while gfcovs are usually associated with extremely high mortalities in their host, suggesting strikingly different host-virus interactions. further studies are ongoing to understand the host range of gfcov/fr/2011 and its determinants of pathogenicity. infectious diseases of wild mammals and birds in europe nucleic acids symp diseases of poultry fields virology acknowledgments this work was supported by the 'epicorem' grant of the agence nationale de la recherche (anr), by the french key: cord-333914-c150ki1n authors: koba, ryota; suzuki, satori; sato, go; sato, shingo; suzuki, kazuo; maruyama, soichi; tohya, yukinobu title: identification and characterization of a novel bat polyomavirus in japan date: 2020-08-20 journal: virus genes doi: 10.1007/s11262-020-01789-7 sha: doc_id: 333914 cord_uid: c150ki1n a novel polyomavirus (pyv) was identified in the intestinal contents of japanese eastern bent-wing bats (miniopterus fuliginosus) via metagenomic analysis. we subsequently sequenced the full genome of the virus, which has been tentatively named miniopterus fuliginosus polyomavirus (mfpyv). the nucleotide sequence identity of the genome with those of other bat pyvs was less than 80%. phylogenetic analysis revealed that mfpyv belonged to the same cluster as pyvs detected in miniopterus schreibersii. this study has identified the presence of a novel pyv in japanese bats and provided genetic information about the virus. electronic supplementary material: the online version of this article (10.1007/s11262-020-01789-7) contains supplementary material, which is available to authorized users. bats are considered the natural reservoirs of a variety of zoonotic rna viruses, such as ebola viruses, marburg viruses, and severe acute respiratory syndrome coronavirus [1] [2] [3] . several dna viruses, including adenoviruses, herpesviruses, and polyomaviruses (pyvs), have also been detected [4] [5] [6] . however, the pathogenic and zoonotic roles of these dna viruses have been not clarified. pyvs are small double-stranded dna viruses with a circular genome of approximately 5 kbp. the viral genome consists of three regions: regulatory, early, and late regions. the regulatory region is responsible for transcription from both the early and late promoters and the initiation of viral dna synthesis. the early region contains genes encoding the large t antigen (tag) and small t antigen (tag). the late region contains the structural proteins vp1, vp2, and vp3 [7] . although pyv diversity in bat populations in north, central, and south america, africa, indonesia, and new zealand were investigated in previous studies [6, [8] [9] [10] [11] [12] , the prevalence and genetic diversity of pyvs in japanese bats remain unclear. the aims of this study were to (i) determine the presence of pyvs in japanese bats, (ii) characterize the genomic structure of bat pyvs, and (iii) analyze the evolutionary relationships between the bat pyv detected in this study and other known bat pyvs. eighteen bats (miniopterus fuliginosus) were collected in wakayama prefecture, japan. the pooled intestinal contents obtained from each bat (approximately 1 g/body) were prepared as a 10% suspension in sterilized phosphate-buffered saline (pbs a total of 10,136,210 reads and 5362 contigs were obtained in a pool of sample from 18 bats. to identify homologous sequences, the obtained genomic data were analyzed via a blastn search using the dna data bank of japan database in accordance with a previously reported method [13, 14] . virus-related sequences were identified in 123 contigs. of these, 14 contigs contained pyv-like sequences with high identities. other contigs contained the sequences of eel river basin pequenovirus, montastraea cavernosa colony-associated virus, and grapevine-associated totivirus-1. to determine the complete viral genome of these pyv-like sequences, pcr was performed using la taq dna polymerase (takara bio, otsu, japan) in accordance with the manufacturer's instructions. specific pcr primers were designed on the basis of the sequences obtained from the contigs. the primer sequences were as follows: pyv-f1 (sense, 5′-aag ttt gca gta gtc ttt gaa gat gtg aag ggt c-3′), pyv-r1 (antisense, 5′-cac tcc tgg gct ttc ctg ctc ata ttt atg ca-3′), pyv-f2 (sense, 5′-cat aaa cag ggt caa acc ac-3′), and pyv-r2 (antisense, 5′-aag cac tcc acc aaa gga aa-3′). dna extracted from the pooled sample of bat intestinal contents was used as the template. the pcr products were visualized via electrophoresis on a 1% agarose gel stained with sybr safe (life technologies, carlsbad, ca, usa). the full genomic dna could be amplified by two independent pcr using the aforementioned primers. the amplified dna was cloned by inserting the pcr product into the pcr2.1 topo vector (life technologies) in accordance with the manufacturer's instructions. the obtained sequences were analyzed using the bigdye terminator v3.1 cycle sequencing kit (life technologies), and nucleotide sequences were assembled using atgc computer software (genetyx corporation). a homology search was performed using ncbi blast. the genome of miniopterus fuliginosus polyomavirus (mfpyv) has a length of 4956 bp (accession number: lc529726). the genome organization includes an early region coding for tag and tag on one strand and a late region encoding the capsid proteins vp1, vp2, and vp3 on the opposite strand. a noncoding regulatory region (nccr) was located between the start of the early region and that of the late region, in line with previous findings for bat pyvs (fig. 1a and supplementary table 1) [9] [10] [11] [12] . interestingly, open-reading frames encoding vp2 and vp3 of mfpyv did not overlap with that of vp1. the stop codons of vp2 and vp3 are located at base positions 1184-1186, whereas the start codon of vp1 is located at base positions 1188-1190. a single nucleotide (guanine) at 1187 separates the vp2/3 and vp1 regions ( fig. 1a and supplementary fig. 1 ). therefore, genomic composition of mfpyv is genetically different from those of typical pyvs in terms of non-overlapping vp regions. figure 1b and c present the phylogenetic trees of vp1 and tag of mfpyv and 28 other bat pyvs constructed using neighbor-joining analysis. based on phylogenetic analyses of the vp1 and tag amino acid sequences, both regions of mfpyv are closely related to those of other miniopterus pyvs and group b bat pyvs (fig. 1b and c) . vp1 is a major pyv structural protein that is indispensable for entry of the virus into host cells [7] . mfpyv vp1 displayed less than 72% nucleotide sequence identity with other bat pyvs (supplementary table 2 ). tag is a multifunctional protein that plays important roles in viral dna replication and the regulation of viral and cellular gene expression [15] [16] [17] . the predicted mfpyv tag exhibited low similarity (< 73%) with those of other pyvs (supplementary table 2 ). mfpyv tag sequences contained features known to be conserved in tags of other bat pyvs, including the highly conserved dnaj domain (hpdkgg), a retinoblastoma (rb)-binding motif (lycne), and several functional motifs (supplementary fig. 2 ). according to a previous report, these elements work together to bind rb and interrupt its interaction with the e2f transcription factor to promote viral replication and cell cycle progression [18] . tag is generated via alternative splicing of the early mrna transcript [11, 19] . in the early region of the mfpyv genome, conserved predicted splice donor sites are located at base positions 4729-4734 (cct gag /gta agg ) and 4346-4351 (ttt cag /gtc ttc ) (fig. 1a) . in the deduced nccr region of the mfpyv genome, several conserved elements were identified, including several copies of the consensus tag binding site gaggc and its reverse complement gcctc supplementary fig. 3) . these elements are likely to comprise the core of the replication origin [20] . comparison of the full-length genome sequence of mfpyv with those of other bat pyvs revealed that mfpyv is most closely related to the ky156 strain with 70% nucleotide sequence identity (supplementary table 2 ). according to the polyomaviridae study group of the international committee on taxonomy of viruses, a novel pyv species should have < 81% nucleotide sequence identity to other known pyvs [7] . mfpyv exhibited less than 81% nucleotide sequence homology to the known reference pyvs including previously reported bat pyvs. in line with the nomenclature of the other bat pyvs, we propose the name mfpyv for the newly discovered virus. for virus isolation, we attempted to propagate the mfpyv strain using the tb1-lu cell line derived from the lungs of the free-tailed bat tadaria brasiliensis (atcc #ccl-88). however, a cytopathic effect was not observed in the cells following serial passage of the cultures. viral dna replication was also not detected in the cells and supernatant collected at each passage. there is a need for additional research to identify efficient cell culture systems for bat pyvs to elucidate the viral infection/replication mechanisms and their pathogenicity. in conclusion, we detected a novel pyv genome sequence in japanese bats. further epidemiological investigations are needed to determine the extent of pyv genetic variation in various bat species in japan. bats: important reservoir hosts of emerging viruses from sars to mers: 10 years of research on highly pathogenic human coronaviruses large serological survey showing cocirculation of ebola and marburg viruses in gabonese bat populations, and a high seroprevalence of both viruses in rousettus aegyptiacus first detection of adenovirus in the vampire bat (desmodus rotundus) in brazil a novel bat herpesvirus encodes homologues of major histocompatibility complex classes i and ii, c-type lectin, and a unique family of immunerelated genes discovery of diverse polyomaviruses in bats and the evolutionary history of the polyomaviridae taxonomical developments in the family polyomaviridae detection of polyoma and corona viruses in bats of canada novel polyomaviruses in south american bats and their relationship to other members of the family polyomaviridae detection of novel polyomaviruses in fruit bats in indonesia genomic characterization of two novel polyomaviruses in brazilian insectivorous bats discovery of novel virus sequences in an isolated and threatened bat species, the new zealand lesser short-tailed bat (mystacina tuberculata) evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery the fecal virome of pigs on high-density farm sitespecific binding of wild-type p53 to cellular dna is inhibited by sv40 t antigen and mutant p53 sv40 large t antigen targets multiple cellular pathways to elicit cellular transformation cellular transformation by sv40 large t antigen: interaction with host proteins the molecular chaperone activity of simian virus 40 large t antigen is required to disrupt rb-e2f family complexes by an atp-dependent mechanism rna processing in the polyoma virus life cycle sequences flanking the pentanucleotide t-antigen binding sites in the polyomavirus core origin help determine selectivity of dna replication publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations key: cord-295554-0pzjyrdf authors: lima, francisco esmaile de sales; campos, fabrício souza; kunert filho, hiran castagnino; batista, helena beatriz de carvalho ruthner; carnielli júnior, pedro; cibulski, samuel paulo; spilki, fernando rosado; roehe, paulo michel; franco, ana cláudia title: detection of alphacoronavirus in velvety free-tailed bats (molossus molossus) and brazilian free-tailed bats (tadarida brasiliensis) from urban area of southern brazil date: 2013-03-16 journal: virus genes doi: 10.1007/s11262-013-0899-x sha: doc_id: 295554 cord_uid: 0pzjyrdf a survey was carried out in search for bat coronaviruses in an urban maternity roost of about 500 specimens of two species of insectivorous bats, molossus molossus and tadarida brasiliensis, in southern brazil. twenty-nine out of 150 pooled fecal samples tested positive by reverse transcription-pcr contained fragments of the rna-dependent rna polymerase gene of coronavirus-related viruses. the sequences clustered along with bat alphacoronaviruses, forming a subcluster within this group. our findings point to the need for risk assessment and continued surveillance of coronavirus infections of bats in brazil. electronic supplementary material: the online version of this article (doi:10.1007/s11262-013-0899-x) contains supplementary material, which is available to authorized users. bats (order chiroptera, suborders megachiroptera and microchiroptera) are one of the most diverse and widely distributed groups of mammals, representing *20 % of all known mammalian species [1] . about a 100 different viruses have been identified in bats of different species in asia, europe, north america and africa. therefore, such species may be natural reservoirs for a large variety of potentially zoonotic rna viruses, such as lyssaviruses, paramyxoviruses, ebola and marburg viruses as well as the recently emerged severe acute respiratory syndrome coronavirus (sars-cov) [2] [3] [4] [5] [6] . a variety of other coronaviruses have been detected in many bat species from asia, including specimens of the genus rhinolophus, which were found to be infected with sars-like cov. phylogenetic analyses of such viruses revealed that those form a large clade within betacoronavirus genus, along with sars coronaviruses from palm civets and the sars coronaviruses recovered from humans during the 2003 outbreak [7, 8] . these data suggested that the agent responsible for the 2002-2003 pandemic might have originated from bats. in addition, in 2012, a new human coronavirus (hcov-emc), which has been associated to clinical disease that resembles sars, emerged in the middle east. this new virus appears to have originated from bats, raising the possibility that hcov-emc jumped species directly from bats to humans [9] . in brazil, most studies looking for associations between bats and viruses have focused on the role for those species as reservoirs for rabies virus [10] . however, to date, more than 160 bat species have been detected in brazil, comprising members of the families phyllostomidae, vespertilionidae, and molossidae. it is estimated that at least 40 bat species live in the state of rio grande do sul, southern brazil, where the predominantly sub-tropical climate seems to favor the settlement of such species [11] . in view of the potential role that bats may play in the transmission of new viral infections to humans and other species, this study was set up in search for coronavirus genomes in bats from the urban area of porto alegre (30°01 0 59 00 s; 51°13 0 48 00 w), a town with about 1.5 million inhabitants and capital of the state of rio grande do sul, brazil. with that purpose, coronavirus rna was searched in feces of two species of synanthropic insectivorous bats collected in a maternity roost within the urban area of the city. a maternity roost of bats known to have direct contact with people and domestic animals was identified in the summer of 2012 in the attic of a residence in the central area of porto alegre, southern brazil. the colony was estimated to harbor about 500 bat specimens of insectivorous bats of two species, velvety free-tailed bats (molossus molossus) and brazilian free-tailed bats (tadarida brasiliensis). speciation was confirmed by amplification and sequencing of the mitochondrial cytochrome b (cytb) gene as described [12] . one hundred and fifty fecal samples were collected from the attic floor as follows: a plastic film was spread on the ground of the attic compartment and fresh droppings were collected with clean disposable forks in the following night. each sample consisted of five fecal droppings, which were immediately sent to the laboratory and stored at -80°c. the samples were then submitted to total rna extraction with trizol (invitrogen tm ). cov rna screening was performed by reverse transcription-polymerase chain reaction (rt-pcr) in a total volume of 25 ll reaction using conserved primers for the rna-dependent rna polymerase gene (forward: 5 0 -ggttgggactatc ctaagtgtga-3 0 and reverse: 5 0 -ccatcatcagatag aatcatcata-3 0 ). this pair of primers is expected to give rise to amplicons of 440 bp [7] . the cycling conditions were: 5 min at 94°c followed by 35 cycles of 1 min at 94°c, 1 min at 49°c and 1 min at 72°c, followed by a final extension time of 5 min at 72°c. bovine coronavirus (bcov) rna was used as a positive control to optimize the assay. standard precautions were taken to avoid pcr contamination; blank controls without template were included in every set of five rt-pcr assays. five microliters of the pcr products were electrophoresed in 1.5 % agarose gels and the products visualized on uv light after staining with ethidium bromide. the amplicons obtained were cloned into pcr ò 2.1-topo ò cloning kit (invitrogen) before being submitted to nucleic acid sequencing. sequencing was performed with the big dye terminator cycle sequencing ready reaction (applied biosystems, uk) in an abi-prism 3100 genetic analyzer (abi, foster city, ca), following the manufacturer's protocol. sequence analyses were performed with the blast software [13] . nucleotide sequences were aligned and compared to human and animal cov sequences available at genbank database with the program clustalx 2.0 [14] . alignments were optimized with the bioedit sequence alignment editor program version 7.0.9 [15] . the protocol to generate the phylogenetic trees was selected with the program modeltest 3.7 [16] . phylogenetic analysis was carried out using mega 5.0; pairwise genetic distances were calculated by the tamura 3-parameter model and phylogenetic trees were constructed using the neighbourjoining method. bootstrap values were determined by 1,000 replicates to assess the confidence level of each branch pattern. pcr amplicons with the expected size of the targeted region were obtained from 29 out of the 150 (19.33 %) pools of bat fecal samples. the nucleotide sequences of sixteen randomly selected amplicons were determined and submitted to genbank (accession numbers kc 110770 to kc 110785). genetic analyses provided evidence that the viruses circulating in these two species of insectivorous bats belong to the genus alphacoronavirus. when compared with each other, all the obtained sequences showed a high nucleotide and amino acid identity (90.6 to 100 % and 98 to 100 %, respectively) (supplemental material). the rdrp sequences examined here were distantly related (\75 % nt identity) to other known alphacoronaviruses. the closest bat coronaviruses rdrp sequences found in genbank were the asian (btcov/a633/2005) and north american (rm-btcov 6 and rm-batcov 11) bat coronaviruses (fig. 1) . the percentage of nucleotide similarity between the sequences described here and those of asian and north american coronaviruses ranged from 72.4 to 76 %, whereas at the amino acid level, the similarity ranged from 74 to 81 % (data not shown). during the last two decades, several studies have shown that various important human and animal pathogens are of bat origin; these species have become targets for several surveillance studies aiming the detection of other potentially pathogenic viruses for humans and other animals. the association of these pathogens and possible disease outbreaks caused by direct or indirect contact of humans with bats stimulated the development of research activities on bat-borne viruses. in addition, the advances of molecular techniques offer opportunities for the discovery of novel dna and rna bat viruses without the need for virus isolation and bat pooled fecal samples being used as source for viruses, preventing animal manipulations [17, 18] . in our study, we detected rdrp sequences of bat cov at a frequency of 19.33 % in the examined samples; such frequency is comparable to previous results obtained in similar studies from different bat species in other countries (ranging figure) . the tree was generated based on the neighbor joining method in the mega program. the nucleotide sequence of the equivalent genome fragment of sars-cov was included as outgroup (fig. 1) . these results show that similar coronaviruses are found in different bat species that are distributed in geographically distant regions, suggesting a low degree of host restriction for coronavirus in those bat populations. in contrast to the enormous diversity of cov genomes found in old world bats [12, 25] , in this study and in several others concerning the cov detection in new world bats, only alphacoronaviruses were detected [2, 3, 17, 18] . based on these results, it has been hypothesized that covs found in new world bats are less diverse than those detected among old world bats [26] . in this initial study, samples were restricted in location and variety of bat species, and we found only alphacoronaviruses. such findings do not reflect data on incidence or prevalence of such infections in bat populations. however, one cannot exclude the possibility that a greater diversity may become apparent in brazilian bats as long as larger numbers of samples from a wider spectrum of species are examined. to our knowledge, this is the first report of cov detection in feces from presumably healthy insectivorous bats in brazil. however, it is very likely that other bat species might also be infected with similar viruses. additional studies with larger numbers of bats and bat species, as well as the continued vigilance on the occurrence of viral infections in bats over the years is required to follow the evolution of bat coronaviruses in its interactions with the different bat host species. in addition, the detection of covs in brazilian bat populations in close proximity to human inhabitants may represent a risk to human health. our findings point to the need to identify the prevalence of covs in brazilian bats, to perform risk assessment studies and continued surveillance of coronavirus infections from both urban and rural environments. mammal species of the world: a taxonomic reference vector borne zoonotic dis proc. natl. acad. sci. usa bioedit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/nt acknowledgments we would like to thank the government agencies finep, cnpq, and capes by the financial support. p.m. roehe, a.c. franco, and f.r. spilki are cnpq research fellows. key: cord-269340-o9jdt86j authors: callison, scott andrew; jackwood, mark w.; hilt, deborah ann title: infectious bronchitis virus s2 gene sequence variability may affect s1 subunit specific antibody binding date: 1999 journal: virus genes doi: 10.1023/a:1008179208217 sha: doc_id: 269340 cord_uid: o9jdt86j the s2 gene of several strains of infectious bronchitis virus (ibv) belonging to the arkansas, connecticut, and florida serotypes was sequenced. phylogenetic analysis of the s2 gene nucleotide and deduced amino acid sequence data resulted in groups of strains that were the same as groupings observed when s1 sequence data was used. thus, it appears that s2 subunits are conserved within a serotype but not between serotypes. although the sequence differences were small, we found that only a few amino acid differences were responsible for different secondary structure predictions for the s2 subunit. it is likely that these changes create different interactions between the s1 and s2 subunits, which could affect the conformation of the s1 subunit where serotype specific epitopes are located. based on this sequence data, we hypothesize that the s2 subunit can affect specific antibody binding to the s1 subunit of the ibv spike glycoprotein. infectious bronchitis (ib) is an acute, highly transmissible, upper respiratory tract disease in chickens. clinical signs include tracheal rales, nasal exudate, coughing, and sneezing. infectious bronchitis affects both sexes and the disease may spread to the reproductive and renal systems (1) . it is of economic importance because it can cause poor weight gain and reduced feed ef®ciency in broilers and a decline in egg production and egg quality in layers (2) . infectious bronchitis virus (ibv), the causal agent of ib, is a member of the coronaviridae family. the virion is pleomorphic (diameter 90±200 nm) and enveloped with club-shaped surface projections (spikes) on the surface of the virion. it contains a single stranded, positive-sense rna genome approximately 27.5 kb in length (3) . the virion contains four major structural proteins: a nucleocapsid (n) protein associated with the viral rna, the integral membrane (m) glycoprotein, a small membrane (sm) protein, and the spike (s) glycoprotein. the s glycoprotein is a polypeptide of approximately 1200 amino acids. it is proteolytically cleaved after translation into two subunits, s1 and s2 (4) . both subunits are glycosylated with high mannose, n-linked oligosaccharides (5) . the virion spike is thought to be an oligomeric protein composed of two polypeptides each of the s1 and s2 subunits. the two subunits associate by noncovalent forces and retain their three-dimensional shape by way of intrapeptide, but not interpeptide, disul®de bridges (5) . the s2 subunits, which form the stalk portion of the spike, anchor it in the membrane, whereas the s1 subunits form the globular head of the spike glycoprotein (5) . the s1 subunit encodes amino acids involved in the induction of neutralizing, serotype speci®c, and hemagglutination inhibiting antibodies (6, 7) . although the s1 subunit of ibv has been examined extensively, the s2 subunit remains enigmatic. based on the highly conservative nature of the s2 subunit among different members of the coronavirus genus *corresponding author. e-mail: mjackwoo@arches.uga.edu and different strains of ibv, it would appear that it plays little or no role in the induction of a host immune response (8) . however, it has been shown for ibv that an immunodominant region localized in the n-terminal half of the s2 subunit can induce neutralizing, but not serotype speci®c antibodies (9) . a dna-binding protein region or leucine zipper motif has also been identi®ed in the s2 subunit of other coronaviruses (10) . leucine zipper motifs are thought to be involved in transcriptional activation. furthermore, site-directed mutagenesis of the s2 subunit of another coronavirus, mouse hepatitis virus (mhv), inhibited the binding of a virus neutralizing monoclonal antibody to the unchanged s1 subunit (11) . last, a monoclonal antibody neutralization resistant mutant was reported to have an s1 gene sequence identical to the parental virus, suggesting that the mutant escapes neutralization due to changes in the s2 gene sequence (12) . thus, we are interested in examining the s2 gene and its deduced amino acid sequence of ibv strains in an attempt to determine if it plays a role in the binding of s1 subunit speci®c antibodies to the virus. we selected four strains belonging to the arkansas serotype, ark 99, ark dpi, 3668-4, and gav 92 because their s1 deduced amino acid sequences were very similar, 4 90%. strains 3668-4 and gav 92 were determined to be ark-``like'' strains by restriction fragment length polymorphism (rflp) analysis and later con®rmed by serology studies. (13) . we also selected connecticut 46 and florida 18288 for s2 gene sequencing because these strains are known to share 96.6% deduced amino acid identity for their s1 subunits, yet remain serologically distinct (14, 15) . infectious bronchitis virus strains used in this study are listed in table 1 . viruses were inoculated into embryonating eggs for propagation (16) . the allantoic uid was harvested and stored at à 70 c until needed. the boehringer mannheim (bm) high pure pcr template preparation kit (indianapolis, in) was used to extract viral rna from allantoic¯uid per the manufacturer's directions. the s2 gene of the ibv strains was ampli®ed using primers that¯anked both sides of the entire s2 gene. the 3 h pcr primer (5 h -ttgaatcattaaacagac-3 h ) was designated s2-3 h ark, and the 5 h pcr primer (5 h -gtaggtattcttacttcacgta-3 h ) was designated s2-5 h ark. the relative primer positions using the atg start site for the beaudette strain s1 gene (m95169) as 1, were 1516 to 1537 for s2-5 h ark and 3480 to 3497 for s2-3 h ark. the reverse transcriptase (rt) and polymerase chain reaction (pcr) were conducted as previously described (17) . the amplicon was puri®ed and concentrated using genelute tm spin columns (supelco, bellefonte, pa 16823-0048) and microcon tm 30 columns (amicon, beverly, ma 01915), respectively. (20) ) and hydrophobicity plots (hopp and woods (21) and kyte and doolittle (22)) using s2 deduced amino acid sequence data were done with computer algorithms using macdnasis pro v3.5 and lasergene v 3.12. there was a high nucleotide similarity for the s2 genes from the ibv strains used in this study ( table 2 ). the s2 gene sequence for the related arkansas serotype strains ark 99 and ark dpi were identical, while 3668-4 and gav 92 were respectively 98.9% and 98.6% similar to both ark 99 and ark dpi. the s2 gene nucleotide sequences of the florida 18288 and connecticut 46 strains were 99.8% similar. the deduced amino acid sequence of the s2 subunit was also compared ( there were few amino acid differences among all the ibv strains (fig. 1) . the strains 3668-4 and gav 92 had 7 and 11 amino acid substitution differences, respectively, when compared with the ark 99 and ark dpi strains. the florida 18288 and connecticut 46 strains had only two differences between themselves, and both were nonconservative. sequence data for the s2 genes of other ibv strains was used to construct a phylogenetic tree for the deduced amino acid sequence of the s2 subunit (fig. 2) . in the alignment, members of the u.s. serotypes arkansas, mass, connecticut, florida, and foreign s2 gene sequence variability serogroups b and c, fall into the same groupings as observed when deduced amino acid sequence data for the s1 subunit is used for phylogenetic analysis (fig. 3) . however, the range of percent similarities was much less for the s2 subunit sequence than that observed for the s1 subunit sequence data. there were no amino acid differences within the immunodominant region of the s2 subunit for the arkansas serotype strains (approximately the ®rst 30 residues). there were also no differences between the connecticut 46 and florida 18288 strains in the immunodominant region, however, there were differhydrophobicity plots using the hopp and woods (21) algorithm gave identical values of à 0.24 + 0.01 for each strain. however, there were differences in the predicted secondary structures using the method of chou and fasman (19) . the predicted secondary structure of the s2 subunit of the ark 99 and ark dpi strains were identical due to their identical protein sequence. the predicted secondary structure of the s2 subunit of the 3668-4 strain differed from that of the ark 99 and ark dpi strains due to amino acid substitutions at position 50 (e to g) and 70 (h to n) that resulted in the addition of two turns (fig. 4) . the gav 92 strain differed tremendously due to an amino acid substitution at position 50 (e to g), resulting in an odd number of turns between amino acids 40 and 75. the odd number of turns resulted in a 180 ¯i p in the middle of the predicted secondary structure. the predicted secondary structure for the s2 subunit of the florida 18288 and connecticut 46 strains was remarkably different (fig. 5 ). there were two nonconservative amino acid changes at positions 227 and 274. the alanine residue at position 227 for the connecticut 46 strain was changed to a threonine residue for the florida 18288 strain. this resulted in the changing of some amino acid residues from helix to sheet and the addition of a turn in the predicted secondary structure of the s2 subunit for the florida 18288 strain. the histidine residue at position 274 for the connecticut 46 strains was changed to a tyrosine residue for the florida 18288 strain. this resulted in the changing of some amino acid residues from helix to sheet and the addition of a coil in the secondary structure of the s2 subunit for the florida 18288 strain. we analyzed six strains of ibv in the arkansas, connecticut, and florida serotypes. although s2 sequence data are more conserved among different strains of ibv than s1 sequence data, it appears that strains can be grouped into serotypes based on s2 gene nucleotide sequence data, as well as deduced amino acid sequence for the s2 subunit. this agrees with s1 gene phylogenetic trees for u.s. and international viruses. the only exception for grouping is between the connecticut and florida serotypes, which cannot be grouped into different serotypes using s1 gene or deduced amino acid sequence data, but can be separated serologically (14, 15) . based on the secondary structure predictions using the chou and fasman (19) algorithm it appears that only a few amino acid changes in the correct location can alter the shape of the s2 subunit. one change in the gav 92 s2 deduced amino acid sequence ( position 50 e?g) led to a 180 ¯i p in the secondary structure prediction of the s2 subunit. the two nonconservative amino acid changes between the florida 18288 and connecticut 46 strains led to radically different secondary structure predictions. it is plausible that these s2 subunit secondary structure changes could affect the tertiary structure of the s2 subunit. therefore, creating different interactions between the s1 and s2 glycoproteins that could change the quaternary structure of the spike glycoprotein. such changes would affect antibody binding and therefore account for serologic differences between gav 92, 3668-4, and arkansas viruses as well as the serotype differences between the connecticut and florida strains. the s1 and s2 subunits are known to interact by noncovalent attractive forces (5) . other research on a different coronavirus, mouse hepatitis virus, by grosse et al., showed that a single amino acid change in the s2 subunit could create a s1 subunit speci®c monoclonal antibody resistant mutant (11) . this suggests that the interaction between s1 and s2 subunits may determine the shape or availability of s1 subunit speci®c epitopes. whether the s2 subunit is actually involved in s1 subunit speci®c antibody recognition, sterically hinders antibody from binding to the s1 subunit, or effects the presentation of s1 subunit epitopes is not known. however, from our sequence data we hypothesize that the s2 subunit can affect binding of s1 subunit speci®c antibody due to s2 gene variability and subsequent secondary structure differences. the nucleotide sequence data reported in this paper have been submitted to the genbank nucleotide sequence data base and have been assigned the following accession numbers: arkansas 99, af094814; arkansas dpi, af094815; 3668-4, af094816; gav 92, af094817; connecticut 46, af094818; florida 18288, af094819. the coronaviridae diseases of poultry. iowa state university press fundamental virology the coronaviridae arch of virol 142, 2249±2256 a laboratory manual for the isolation and identi®cation of avian pathogens secondary structure prediction of the s2 glycoprotein for the connecticut 46 and florida 18288 strains of ibv using the chou and fasman proceedings from the national academy of science key: cord-005207-02wmt2e9 authors: lee, hee-kyung; yeo, sang-geon title: cloning and sequence analysis of the nucleocapsid gene of porcine epidemic diarrhea virus chinju99 date: 2003 journal: virus genes doi: 10.1023/a:1023447732567 sha: doc_id: 5207 cord_uid: 02wmt2e9 the nucleocapsid (n) gene of the porcine epidemic diarrhea virus (pedv) chinju99 which was previously isolated in chinju, korea was cloned and sequenced to establish the information for the development of genetically engineered diagnostic reagents. also, sequences of the nucleotides and deduced amino acids of the chinju99 n gene were analyzed by alignment with those of cv777 and br1/87. the nucleotide sequence encoding the entire n gene open reading frame (orf) of chinju99 was 1326 bases long and encoded a protein of 441 amino acids with predicted m (r) of 49 kda. it consisted of 405 adenine (30.5%), 293 cytosine (22.1%), 334 guanines (25.2%) and 294 thymines (22.2%) residues. the chinju99 n orf nucleotide sequence was 96.5% and 96.4% homologous with that of the cv777 and br1/87, respectively. the chinju99 n protein revealed 96.8% amino acid identity with that of br1/87 and cv777, respectively. the amino acid sequence contained seven potential sites for threonine (t)or serine (s)-linked phosphorylation by each protein kinase c and casein kinase ii. porcine epidemic diarrhea virus (pedv) causes an acute infection in piglets of 1±2 weeks old, and the disease is characterized by severe enteritis and diarrhea, leading to death with mortality up to 90% [1, 2] . pedv is a member of the genus coronavirus of the family coronaviridae [3] . the genome consists of a single molecule of positive-sense, single-stranded rna, 27±32 kb in size, which is transcribed into a nested set of several 3 0 -coterminal subgenomic mrnas for the production of structural and nonstructural proteins [3, 4] . among structural proteins of the virion, spike (s) glycoprotein (180±220 kda) plays an important role in the attachment of the virion on the host's receptors and penetration into the intestinal villous cells by fusion. the s glycoprotein also induces the production of neutralizing antibodies in the host [5±7], and therefore, is an important substance for the immunity against pedv. on the other hand, nucleocapsid (n) protein (55±58 kda) is known as a basic phosphoprotein associated with the genome [1, 3, 8, 9] , which can be the target for the accurate and early diagnosis of pedv infection by molecular techniques. cloning and nucleotide sequencing have been done on these genes of cv777 and br1/87 strains [5, 10] . the gene products can be the feasible alternative to develop genetically engineered vaccines and diagnostic reagents. since isolation of pedv in korea was first reported in 1993 [11] , the virus has been one of the major causes for the death of suckling piglets in pig farming. park et al. [12] cloned a dna of 750 bases from n gene of the viral rna in swine feces, but no further studies on the viral isolation and gene cloning have been reported. in the development of genetically engineered proteins for diagnostic reagents against pedv, molecular characterization of the n gene is rudimental that still need further elucidation. pedv infections occur frequently in korea, and developmental efforts should be geared toward rapid diagnosis and control of the disease. to our knowledge, nucleotide sequences of the full-length n gene of korean pedv isolates have not been reported. in the present study, a dna clone was constructed for the full-length n gene open reading frame (orf) of pedv isolated in chinju, korea. the complete sequences of nucleotides and deduced amino acids of the n gene were determined, and further analyzed with those of other pedvs for the information in the production of genetically engineered diagnostic reagents. a strain of pedv, chinju99 which was previously isolated from the intestinal tissues of piglets suffering from severe diarrhea by virology laboratory of gyeongsang national university college of veterinary medicine, chinju, korea (data not shown), was used. the virus was propagated in monolayer of vero cells grown in minimal essential medium (mem) containing streptomycin (100 mg/ml), penicillin (100 u/ml) and trypsin (10 mg/ml) in a 5% co 2 incubator at 37 c following the methods of hofmann and wyler [13] . when syncytial formation appeared in the vero cells after propagation of the virus, the wasted mem was removed. the cells were washed with pbs (ph 7.2) and lysed by trizol 1 reagent (invitrogen, usa) at 2 ml per tissue culture flask (25 cm 2 ), and homogenized by passing the cell lysate several times through a pipette. viral rna was extracted from the homogenate following the manufacturer's suggestions and dissolved in diethyl pyrocarbonate-treated distilled water. a pair of sense and antisense primers was designed and aligned based on nucleotide sequences of the n gene of cv777 and br1/87 [10, 14] from the genbank data base (national center for biotechnology information, usa). the sense primer nf1 (5 0 ccgagtgc-ggttctcacagat3 0 ) and antisense primer nr1 (5 0 catagccaggataagccggtc3 0 ) were used to generate cdna for the n gene of chinju99 and relative position of the primers are shown in fig. 2 . synthesis of the first-strand cdna for the n gene was carried out by reverse transcription (rt) using superscript ii 1 reverse transcriptase reagent kit (invitrogen) following manufacturer's suggestions. the viral rna was mixed with 1 ml of 100 pm of the antisense primer, 4 ml of 5x first-strand buffer, 1 ml of 10 mm dntp mixture, 2 ml of 0.1 m dtt, 1 ml of rnase inhibitor (40 u/ml), 1 ml of reverse transcriptase (200 u/ml) and brought to 20 ml with distilled water. the reaction mixture was incubated for 50 min at 42 c, and the reaction was stopped by heat for 15 min at 70 c. to degrade rna template, the reaction mixture was treated with rnase h (1 u) for 20 min at 37 c. the ds-cdna for the n gene was synthesized by polymerase chain reaction (pcr) using a reagent kit (perkin-elmer, usa). a 10 ml portion of the firststrand cdna template was added to 5 ml of 10x pcr buffer, 4 ml of 25 mm mgcl 2 ,1 ml of 10 mm dntp mixture, 1 ml of each 100 pm sense and antisense primers, 1 ml of taq dna polymerase (5 u/ml) and brought to 50 ml with distilled water. the pcr was carried out in a thermocycler (perkin-elmer) following the program of 2 min at 94 c and 30 cycles of 1 min at 94 c, 1 min at 55 c and 1 min at 72 c, and a final extension at 72 c for 5 min. the pcr products were resolved by electrophoresis in 1% agarose gel. following the routine methods in gene cloning [15] , the pcr-generated n gene ds-cdnas were bluntended with klenow enzyme (2 u) and 1 ml of 0.5 mm dntps (invitrogen) in 20 ml reaction volume and cloned into the smai site of ptz19r plasmid dna by ligation using t4 dna ligase (1 u) (invitrogen). the recombinant plasmid dnas were transformed into competent escherichia coli dh5a cells by heat shock for 45 s at 42 c. after adding soc medium (0.5% yeast extract, 2% tryptone, 10 mm nacl, 2.5 mm kcl, 10 mm mgcl 2 , 20 mm mgso 4 , 20 mm glucose), the tube was shaken for 1 h at 220 rpm, 37 c. the transformed cells were plated onto luria bertani (lb) agar (invitrogen) containing ampicillin (50 mg/ml), x-gal (40 mg/ml) and isopropylthio-b-galactoside (20 mg/ml) (invitrogen) and incubated overnight at 37 c. transformed colonies were cultured in lb broth with ampicillin (50 mg/ml) by shaking at 220 rpm, overnight, at 37 c, and were subjected to dna extraction by alkaline-lysis, restriction enzyme digestion and electrophoresis in 1% agarose gel for the identification of recombinant dna clones. nucleotide sequencing was done for the n generecombinant dna clones using dye terminator cycle sequencing kit (perkin-elmer) by the automatic sequencer (abi prism 377, advanced biotechnologies, usa). the sequences of nucleotides and deduced amino acids were analyzed by clustalw, version 1.82 using data available from genbank and the european molecular biology laboratory (embl). n gene nucleotide and amino acid sequences of chinju99 were compared with cv777 and br1/87 [10] (embl accession no. z14976). the protein chemistry of chinju99 amino acids was analyzed using protein statistic programs pepstats (pasteur institute, france) and predictprotein (embl). in the synthesis of ds-cdna of the chinju99 n gene, a dna fragment of 1.4 kb in approximate was amplified by rt-pcr using primers specific to n gene of pedv. the dna was cloned into ptz19r vector dna (fig. 1 ) and subjected to sequencing. the nucleotide sequence encoding the entire chinju99 n gene was 1326 bases in length and contained a single orf. the gene had 46 and 48 nucleotide mismatches compared to cv777 and br1/ 87, respectively (fig. 2) . it consisted of 405 adenine (30.5%), 293 cytosine (22.1%), 334 guanine (25.2%) and 294 thymine (22.2%) nucleotides, and a gc content of 47.3%. the gene showed 96.5% and 96.4% nucleotide sequence homology to that of cv777 and br1/87, respectively. the chinju99 n gene encoded a protein of 441 amino acids with predicted m r of 49 kda. there were seven potential threonine (t)-or serine (s)-linked phosphorylation sites by each protein kinase c and casein kinase ii recognized in the protein. the chinju99 n protein had 14 amino acid mismatches compared to those of cv777 and br1/87 (fig. 3) and showed 96.8% amino acid sequence identity with these strains. bridgen et al. [10] previously cloned a gene of 1326 nucleotides in a single large orf capable of encoding a 441 amino acid protein of 49 kda from pedv cv777 and br1/87, which were very similar nucleotide sequence of pedv n gene in both length and sequence to coronavirus n proteins, and therefore represented it as the pedv n gene. in the present study, the n gene of the pedv chinju99 was cloned and sequencing was done for the cdna clones. the resulting sequence data showed a single orf of 1326 nucleotides encoding a protein of 441 amino acids with m r of 49 kda predicted by pepstats program. chinju99 n gene also had 96.8% amino acid sequence identity with that of cv777 and br1/87 [10] , although there were 14 amino acid mismatches recognized. therefore, the chinju99 n protein revealed the same features for the nucleotide and putative amino acid sequences in the cv777 and br1/87, although pedv n protein is known to possess m r of 55±58 kda by polyacrylamid gel electrophoresis [8, 9] . the pedv n protein is known as a phosphorylated, structural protein associated with viral genome [1, 3, 8, 9] , which appears abundantly in virus-infected cells [9] . therefore, the appearance of n protein can be a clue to the replication of pedv and used for the early and accurate diagnosis so far as the virus replicates in the infected cells. the chinju99 n protein had each seven potential t-or s-linked phosphorylation sites by protein kinase c or casein kinase ii, respectively. similarly, the cv777 and br1/ 87 [10] contained six serine (s) residues as possible phosphorylation sites by these enzymes, although some of the s-linked phosphorylation sites were different with those of the chinju99. in conclusion, the full-length nucleotide sequence in the coding region of n gene of pedv chinju99 was determined in the present study. trials were done to analyze the nucleotide and putative amino acid sequences of the chinju99 n gene comparing to those of other pedvs. however, we could elucidate molecular properties of the n gene by mere comparison to those of cv777 and br1/87, because the full-length nucleotides of the pedv n gene have been determined only in these strains. nevertheless, it was recognized that chinju99 n gene has the minor differences in the structural features of putative protein compared to those of cv777 and br1/87. this can be the feasible information for the development of genetically engineered n protein for the rapid and accurate diagnosis of pedv infections in korea. moreover, the genetic information gained from the chinju99 n gene can be used for diagnostic work such as pcr and nucleic acid hybridization. to our knowledge, this is the first published report on the full-length nucleotides and molecular characteristics of the n gene of korean pedv isolates. and br1/87 [10] : only the amino acids of cv777 and br1/87 which mismatched the chinju99 sequence were included; *, translation termination; seven potential threonine (t)-or serine (s)-linked phosphorylation sites by protein kinase c were underlined; seven potential t-or s-linked phosphorylation sites by casein kinase ii were denoted in italic. diseases of swine hagan and bruner's microbiology and infectious diseases of domestic animals fields virology. lippincott-raven publishers short potocols in molecular biology this study was supported by a grant (no. 981-0613-065-2) from the korea science and engineering foundation (kosef), ministry of science, korea. key: cord-282126-gmjnbnx5 authors: yang, limin; li, jing; bi, yuhai; xu, lei; liu, wenjun title: development and application of a reverse transcription loop-mediated isothermal amplification method for rapid detection of duck hepatitis a virus type 1 date: 2012-08-07 journal: virus genes doi: 10.1007/s11262-012-0798-6 sha: doc_id: 282126 cord_uid: gmjnbnx5 we developed and evaluated a reverse transcription loop-mediated isothermal amplification (rt-lamp) assay for detecting duck hepatitis a virus type 1 (dhav-1). the amplification could be finished in 1 h under isothermal conditions at 63 °c by employing a set of four primers targeting the 2c gene of dhav-1. the rt-lamp assay showed higher sensitivity than the rt-pcr with a detection limit of 0.1 eld(50) 0.1 ml(−1) of dhav-1. the rt-lamp assay was highly specific; no cross-reactivity was observed from the samples of other related viruses, bacteria, allantoic fluid of normal chicken embryos, or the livers of uninfected ducks. thirty clinical samples were subjected to detection by rt-lamp, rt-pcr, and virus isolation, which obtained completely consistent, positive results. as a simple, rapid, and accurate detection method, this rt-lamp assay has important potential applications in the clinical diagnosis of dhav-1. duck hepatitis virus type 1 (dhv-1), a member of family picornaviridae and genus avihepatovirus, is a kind of single-stranded rna virus that causes an acute, highly lethal disease in young ducklings called duck hepatitis. duck hepatitis leads to severe economic losses for duck raising farms. duck hepatitis virus includes three serotypes dhv-1, dhv-2, and dhv-3. dhv-1 is distributed widely, while dhv-2 and dhv-3 have only been reported in the uk and the usa, respectively [1] [2] [3] [4] [5] [6] [7] . duck hepatitis caused by dhv-1 can lead to mortality up to 95 % in young ducklings during the first week of life, thus accurate and efficient diagnosis is extremely useful to control the initial disease outbreak [1] . recently, dhv-1 was renamed to dhav, and dhav has three genotypes (dhav-1, 2 ,and 3) [8] [9] [10] [11] . dhav-1 is distributed widely and prevalent in china, dhav-2 have been only isolated in taiwan until now [9, 10] , while dhav-3 was first isolated in south korea [11] , but now it is also epidemic in mainland of china. the traditional detection methods, including virus isolation and neutralization tests, are generally reliable for the diagnosis of dhv-1 [12] , but these methods have shortcomings, such as labor intensive, time consuming, and have insufficient sensitivity which cannot detect extremely low viral loads. to address this, a virus antigen-based elisa was first established in 1991 [13] , then a recombinant vp1 protein-based elisa was developed, which showed agreement with the neutralization test [14] . nucleic acid-based assays such as rt-pcr, real-time rt-pcr, and real-time quantitative pcr were developed and showed high specificity and sensitivity [1, 8, [15] [16] [17] . however, these assays need specialized and expensive equipment such as a thermal cycler or real-time pcr system, thus they are of limited application in rural areas. loop-mediated isothermal amplification (lamp) assay was developed in 2000 [18] , which is a novel nucleic acid amplification method that occurs under isothermal conditions. this method employs a dna polymerase and a set of four specially designed primers that recognize a total of six distinct sequences on the target dna, which can be amplified with high specificity. lamp continues with the accumulation of 10 9 copies of target in less than an hour [18] . as a simple and efficient diagnostic technique, lamp has been used in the detection of various rna or dna viruses, such as avian leukosis virus [19, 20] , barley yellow dwarf virus [21] , swine transmissible gastroenteritis coronavirus [22] , avian influenza virus [23] , and foot-and-mouth disease virus [24] . here, we report a one-step, single-tube rt-lamp assay for the rapid detection dhav-1, and its specificity and sensitivity were assessed. this method has potential applications in the early diagnosis and forecasting of dhav-1. the dhav-1 strain (dhav-sd, stored at the china general microbiological culture collection center, cgmcc no.3746) was propagated in the allantoic cavities of 9-day old spf chicken embryos. the embryos that died 36-72 h post inoculation were collected. allantoic fluid was centrifuged (1,0009g at 4°c for 10 min) and the suspension was stored at -80°c until it was used for rna extraction [1] . duck enteritis virus (dev), muscovy parvovirus (mpv), avian influenza virus (aiv, h9n2), riemerella anatipestifer (ra), salmonella enteritidis, and escherichia coli (o78), which were maintained in our laboratory, were propagated and the nucleic acids were extracted [1, [25] [26] [27] [28] . total rna was extracted from allantoic fluid and liver samples using trizol reagent (invitrogen, carlsbad, usa) according to the manufacturer's instructions. dhav-1 total rna concentration was measured spectrophotometrically at a260 and a280. this rna was stored at -80°c before use. the primers for the rt-lamp amplification of dhav-1 were designed based on the conserved region in the 2c gene (genbank accession no. jx183548). primers f3, b3, fip, and bip were designed by means of the primer software primer explorer v4 (http://primerexplorer.jp/elamp 4.0.0/index.html; eiken chemical co., japan). the primer sequences are shown in table 1 . the rt-lamp reaction was carried out in a total 25 ll reaction volume containing 1 9 thermopol reaction buffer, 8u of bst dna polymerase, 10u amv reverse transcriptase (new england biolabs, ma, usa), 1 mm dntp mix (newpep, beijing, china), 0.8 m betaine, 6 mm mgso 4 , 0.2 lm of each of the f3 and b3 primers, 0.8 lm of each of the bip and fip primers, and 1.0 ll of the target rna. the mixture was incubated at 63°c for 1 h followed by 5 min at 80°c. after the reaction, the amplified dna products were detected by electrophoresis on a 1.5 % agarose gel (biowest agarose, spain) followed by ethidium bromide staining under ultraviolet light [21] . in order to compare the sensitivity of the rt-lamp assay with other conventional assays, an rt-pcr assay was developed using two pairs of primers (for and rev; f3 and b3) according to the early report with some changes ( table 1 ) [8] . the rt-pcr was carried out in a 25 ll total reaction volume using the one-step rt-pcr kit (newpep, beijing, china) with 0.2 lm of each of the upstream and downstream primers and 1 ll of target rna, according to sensitivity comparison of rt-lamp to rt-pcr to detect the limit of the rt-lamp and rt-pcr assay, dhav-1 total rnas were extracted from the serially 10-fold diluted allantoic fluid, ranging from 10 4 to 10 -3 50 % egg lethal dose (eld 50 ) per 100 ll. this single dilution series was used as a template for the two assays. the products were detected by agarose gel electrophoresis as described above (1.5 % agarose, tae) [21] . to assess the specificity of rt-lamp, including potential cross-reactions with dhav-1, dev, mpv, aiv, r. anatipestifer (ra), s. enteritidis, and e. coli (o78) were examined. total rna from the allantoic fluid of normal chicken embryos and livers of uninfected ducks were also assayed. to evaluate the reliability of the rt-lamp assay, 30 clinical liver samples were collected from dhav-suspected ducks in different provinces of china, including shandong, hebei, sichun, and beijing. rna was extracted from these samples and detected by both the rt-lamp and rt-pcr. the products were detected by agarose gel electrophoresis (1.5 % agarose, tae). the virus isolation method was also applied to the 30 clinical liver samples using the method previously described [8] . in order to obtain more specificity and detect multiple strains of dhav-1, the rt-lamp primers were designed based on a highly conserved region of the 2c gene of the dhav-1 strain. the one-step, single-tube, rt-lamp assay was optimized with the selected primer set by varying the ratio of the concentrations of mgso 4 and dntp, the reaction temperature, and time. to compare the sensitivity of the rt-lamp assay with the conventional rt-pcr, the two assays were used to detect the same rnas which were extracted from 10-fold serial dilutions (from 10 4 to 10 -3 eld50 per 100 ll) of allantoic fluid. dhav-1 total rna concentrations were also measured spectrophotometrically at a260 and a280. therefore, the corresponding rna concentration range is from 2 9 10 4 pg to 2 9 10 -3 pg per assay. the results are shown in fig. 1 . the detection limit of the rt-lamp assay was 0.1 eld 50 per 100 ll, equivalent to 2 9 10 -1 pg dhav-1 total rna per reaction, which was 100-fold higher than the rt-pcr assay. in addition, the rt-pcr assay using two pairs of primers have the same sensitivity. the cross-reactivity of the dhav-1 rt-lamp assay was evaluated with rna from dev, mpv, aiv, ra, s. enteritidis, e. coli (o78), allantoic fluid of normal chicken embryos, and liver of uninfected duck. all these reactions were negative (fig. 2) . to evaluate the feasibility of rt-lamp of detecting dhav-1 in clinical specimens, 30 clinical specimens collected over the past years were assayed by rt-lamp and rt-pcr. in parallel, virus isolation was also performed. the results showed that 11 of the 30 samples tested contained dhav-1 by virus isolation, the same 11 clinical specimens were also positive by both rt-lamp and rt-pcr (fig. 3) . the results of rt-lamp, rt-pcr, and virus isolation were 100 % correlated. several nucleic acid amplification techniques have been developed for the specific and sensitive detection of dhv-1, including rt-pcr and real-time pcr. however, these assays require considerable operator skills, expensive equipment, and 2-4 h for amplification; thus, the application of these assays is limited in the field. compared to traditional pcr technology, lamp has more advantages. first, lamp is more specific since it requires 4 or 6 primers to identify 6 or 8 specific domains [18, 19] , while pcr uses only two primers. second, lamp is more sensitive, for the amplification of lamp is more efficient than pcr. third, lamp does not require expensive and complex equipment, instead it can be performed using a water bath or heat block for incubation under isothermal conditions. finally, lamp is time saving, the assay can be accomplished within 1 h, whereas the pcr technology typically requires 2-4 h [29] . in addition, the lamp amplification products can be observed by the naked eye directly, as sometimes a white precipitate of magnesium pyrophosphate form during the reaction [30] . after a comparison of different dhav-1 subgroup genomes, the conserved domain 2c of the genome was selected as the domain for lamp primer design and was used for screening a group of primers with good amplification efficiency. the 3d gene has also been used to design primers for detection of dhv-1 in the early reports, which encodes an rna-dependent rna polymerase [1, 17, 31] . given that many viruses have rna polymerase gene, we prefer to choose 2c gene as a detecting marker. a one-step rt-lamp assay with high specificity and sensitivity was developed for rapid diagnosis of dhav-1, which has no cross-reaction with dev, mpv, aiv, r. anatipestifer (ra), s. enteritidis, and e. coli (o78), suggesting that this technique has high specificity to distinguish among some common avian viruses and bacteria at the nucleic acid level. the rt-lamp has a detection limit of 0.1 eld 50 per 100 ll, equivalent to 2 9 10 -1 pg dhav-1 total rna per reaction, which was 100 times more sensitive than the conventional rt-pcr, which suggested that this method is useful for the detection of low levels of dhav-1 and is also useful for confirming the early stages of dhav-1 infection when viral titers are relatively low. the rt-lamp method was also used to detect dhav-1 in clinical samples. the results from the rt-lamp assay were consistent with the rt-pcr and viral isolation methods, further confirming the reliability of the rt-lamp assay. considering that dhav-1 rt-lamp has many advantages, such as being highly sensitive, simple, specific, less time consuming, and not requiring expensive equipment, it is therefore more suitable for use as a dhav-1 diagnostic tool in the field or rural areas than other nucleic acid-based assays. in summary, the dhav-1 rt-lamp assay we developed could be a potential diagnostic method for use in the surveillance, control, and molecular epidemiological screening of dhav-1 for using in developing countries. disease of poultry 11th edn acknowledgments financial support was provided by the special fund for the agro-scientific research in the public interest (201003012) and the nature science foundation of china (nsfc 31100644). key: cord-299573-vq6ckqtd authors: lee, meong-hun; jeoung, hye-young; park, hye-ran; lim, ji-ae; song, jae-young; an, dong-jun title: phylogenetic analysis of porcine astrovirus in domestic pigs and wild boars in south korea date: 2012-09-11 journal: virus genes doi: 10.1007/s11262-012-0816-8 sha: doc_id: 299573 cord_uid: vq6ckqtd porcine astrovirus (pastv) belongs to genetically divergent lineages within the genus mamastrovirus. in this study, 25/129 (19.4 %) domestic pig and 1/146 (0.7 %) wild boar fecal samples tested in south korea were positive for pastv. positive samples were mainly from pigs under 6 weeks old. bayesian inference (bi) tree analysis for rna-dependent rna polymerase (rdrp) and capsid (orf2) gene sequences, including mamastrovirus and avastrovirus, revealed a relatively geographically divergent lineage. the pastvs of hungary and america belong to lineage pastv 4; those of japan belong to pastv 1; and those of canada belong to pastv 1, 2, 3, and 5, but not to 4. this study revealed that the pastvs of korea belong predominantly to lineage pastv 4 and secondarily to pastv 2. it was also observed that pastv infections are widespread in south korea regardless of the disease state in domestic pigs and in wild boars as well. their association with enteric diseases is not well documented, with the exception of turkey and mink astrovirus infections [2] . family astroviridae is separated into two genera. viruses of the genus mamastrovirus infect mammals, and those of avastrovirus infect avian [3] . avastroviruses include duck astrovirus 1 (dastv-1), turkey astrovirus 1 and 2 (tastv-1 and tastv-2), and avian nephritis virus (anv) [2] . mamastroviruses appear to have a broad host range, including human [1] , sheep [4] , cow [5] , pig [6] , dog [7] , cat [8] , red deer [9] , mouse [10] , mink [11] , bat [12] , cheetah [13] , brown rat [14] , roe deer [15] , sea lion and dolphin [16] , and rabbit [17] . porcine astrovirus (pastv) was first detected by em in the feces of a diarrheic piglet [6] and was later isolated in culture [18] . molecular characterization of the capsid (orf2) gene from this isolate followed some years later [19] . since then, research groups have successfully used pcr approaches to investigate the presence and diversity of pastv [20] [21] [22] . pastv has been detected in several countries, including south africa [23] , the czech republic [20] , hungary [22] , canada [21] , and colombia [24] . in south korea, there have been studies done on astrovirus but were only limited to its detection in human infection. there has been no attempt yet to know the extent of astrovirus infection in the pig population of the country. it was, therefore, the aim of this study to investigate the genetic groups of korean pastv in domestic pigs and wild boars and to identify the incidence of co-infection with other porcine enteric viruses as well. a total of 129 fecal samples of domestic pigs (60 piglets under 3 weeks old, 45 weaned pigs, 14 growing-finishing pigs, and 10 sows over 1 year old) was collected from six piggery farms with good breeding facilities in four provinces of south korea from january to june 2011. out of these collected samples 90 were from diarrheic and 39 were from non-diarrheic pigs. a total of 146 fecal samples of wild boars over 1 year old was collected from the wildlife areas in five provinces of south korea during the hunting season from december 2010 to january 2011. out of these collected samples 34 were from diarrheic and 112 were from non-diarrheic boars. viral rna was extracted from the feces using trizol ls b according to the manufacturer's instructions. pastv was detected in fecal specimens by rt-pcr, as previously described [22] , with primers specific for the rdrp and orf2 regions of pastv (pastv-f, 5 0 -tgacatttt gtggatttacagtt-3 0 and pastv-r: 5 0 -cacccagg gctgacca-3 0 ). the rt-pcr process resulted in the amplification of a 799-nt-long fragment at an annealing temperature of 45°c. products of the expected size were cloned with the pgem-t vector system ii tm (promega, cat. no. a3610, usa). the cloned gene was sequenced with t7 and sp6 sequencing primers on an abi prism ò 3730xi dna sequencer (applied biosystems, foster city, ca, usa) at the macrogen institute (macrogen, seoul, korea). the sequences of all the positive samples for pastv were submitted to genbank under accession numbers jq696831-jq696856. the astroviruses used in this study are listed in table 1 along with their genbank accession numbers. to investigate the relationship between astroviruses and other economically important viral diseases that cause diarrhea in piglets in asia, screening tests were conducted to detect porcine epidemic diarrhea virus (pedv), transmissible gastroenteritis virus (tgev), and porcine group a rotavirus (gar), as previously described [25] . the primer pairs used in this study were p1 (ttctga gtca cgaacagcca, 1466-1485) and p2 (catatg cagcctgctctgaa, 2097-2116) for the s gene of pedv, t1 (gtggttttggtyrtaaatgc, [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] and t2 (cactaaccaacgtggarcta, 855-874) for the s gene of tgev, and rot3 (aaagatgctagggacaaa attg, 57-78) and rot5 (ttcagattgt ggagcta ttcca, 344-365) for the segment 6 region of group a rotavirus. the sizes of the expected products of multiplex rt-pcr were 859 bp for tgev, 651 bp for pedv, and 309 bp for rotavirus, which could be differentiated by agarose gel electrophoresis. out of the 129 domestic pig fecal samples tested, 25 were positive for pastv. prevalence of pastv in weaned pigs (35.6 %, 16/45) was higher than that in suckling piglets (15.0 %, 9/60) and in growing-finishing pigs (9.1 %, 1/14) ( table 2 ). only one wild boar which is coming from the province of gyunggi tested positive for pastv (0.7 %, 1/146). the low prevalence of pastv in wild boars might have been due to the fact that usually older animals (over 1 year old) have lesser susceptibility to infection and generally wild pigs are more resistant to many diseases than the domesticated ones. the percentage of samples that were pastv-positive differed among the six pig farms: chungchong a, 28.6 % (10/35); chungchong b, 22.9 % (8/35); kangwon, 20.0 % (4/20); gyunggi a, 7.1 % (1/14); gyunggi b, 20.0 % (3/15), and gyungsang, 0 % (0/10). the low or no incidence of pastv in gyunggi a (growing-finishing pigs) and gyungsang (sows) can likewise be attributed to the lesser susceptibility of adult pigs to infection. the proportion of non-diarrheic and diarrheic pig fecal samples was 14.0 % (18/129) and 6.2 % (8/129), respectively. these results suggest that pastv is widespread in south korea regardless of the disease status (with or without clinical manifestations) of pigs. although astroviruses are highly prevalent in young pigs and are mostly present in diarrheic pigs, pastv is a common finding as well in the fecal samples of apparently healthy pigs [21, 22] . the clinical significance of pastv infection remains to be clarified. the clinical symptoms of diarrhea are frequently reported to be associated with rotavirus, coronavirus, and calicivirus-like infections in piglets [6, 18, 20, 21, 26] . although pedv and tgev infections were not identified in any of the fecal samples, porcine gar infection was identified in 6.2 % (17/275) of the samples collected from suckling pigs under 3 weeks old (table 2) . furthermore, coinfection with pastv and gar was observed in two cases (one in diarrheic and one in non-diarrheic pig fecal samples). however, it is not cleared yet if gar is directly associated with astrovirus infection in pigs. all astrovirus sequences were aligned initially with the clustalx 1.8 program [27] . the nucleotide sequences were translated and the nucleotide and amino acid sequence identities among the astrovirus strains were calculated with bioedit 7.053 [28] . bayesian trees were generated with mrbayes 3.1.2 [29, 30] using best-fit models which were selected with prottest 1.4 [31] for amino acid alignment. markov chain monte carlo (mcmc) analyses were run with 1,000,000 generations for each amino acid sequence. bayesian posterior probabilities by mrbayes 3.1.2 were estimated on the basis of a 70 % majority rule consensus of the trees. for each analysis, a chicken astrovirus (nc_003790) was specified as the outgroup and a graphic output was produced with treeview 1.6.1 [32] . the best modelssof the rdrp and orf2 amino acid sequences were obtained using protest 1.4, which showed wag ? i ? g and wag ? g ? f, respectively, according to the results of the akaike information criterion (aic). bi trees for the rdrp (fig. 1) and orf2 (fig. 2 ) amino acid sequences revealed the presence of five unrelated pastv lineages. the first lineage (pastv 1) contained the poastv12-3 strain (hm756258) from canadian pig and two porcine strains (y15938 and ab037272) derived from japanese pigs, interestingly, rat astrovirus (hm450382) and porcine poastv cc12 (jn088537) have formed two different lineages on the bayesian trees for the rdrp and orf2 amino acid sequences (figs. 1, 2) . astrovirus strains contained in group 1 (g1), 2 (g2), and 4 (g4) on the two bayesian trees showed similar topologies. however, astrovirus strains in group 3 (g3) on the bayesian tree for the orf2 amino acid sequence were divided into g3 and group 5 (g5) on the bayesian tree for the rdrp amino acid sequence (figs. 1, 2) . a strain isolated from a hungarian wild boar in 2011 [33] belonged to pastv 4 or group 4 (g4) that also contained pastk31 derived from korean wild boar. a previous study suggested that the number of pastv lineages extends to a total of five, all of which most likely represent distinct species of different origins [34] . however, with the available astvs research data from countries around the world, future studies could unveil diverse genetic lineages. in this study, the porcine astrovirus strains appeared to be phylogenetically related to not only prototypical human astroviruses (as was already known) but also the recently discovered novel human strains. this finding suggests the existence of multiple cross-species transmission events between the hosts and the other animal species. several recent studies have shown that bats form multiple independent lineages [35, 36] . bat astrovirus strains in this study also showed independent lineages and specifically, the ld71 (fj571067) strain had a close relationship with astrovirus strains of human, sheep, mink, and sea lion (figs. 1, 2) . a previous study suggested that porcine astvs have played an active role in pigs in the evolution and ecology of the astroviridae [21] . recent studies have shown evidence of multiple recombination events between distinct pastv strains and between pastv and human astrovirus (hastv) in the variable region of orf2 [24] , as well as interspecies recombination between porcine and deer astroviruses [37] . a study of the molecular epidemiology and genetic diversity of human astrovirus in south korea from 2002 to 2007 revealed genotype 1 to be the most prevalent, accounting for 72.19 % of strains, followed by genotypes 8 korean pastvs are shown in bold prints, and strains isolates from korean and hungarian wild boar are marked with a star and an arrow, respectively. the numbers above the nodes represent posterior probabilities (9.63 %), 6 (6.95 %), 4 (6.42 %), 2 (3.21 %), and 3 (1.6 %) [38] . this finding suggests that little interspecies (between human and pig) transmission has occurred until now in south korea. in conclusion, this study extends current knowledge of pastv in wild boar and domestic pig. a more extensive study should be done on wild life pastvs to further elucidate their potential role in the epidemiological landscape of the astrovirus infection in domestic pig population. to a greater length, continuous surveillance on the prevalence of both pastvs will provide a wider understanding of the possible cross-species or human transmissions, in particular. virus taxonomy. eighth report of the international committee on taxonomy of viruses acknowledgments we are grateful to mr. min-heg lim and ms. sa-ra choi for technical assistance. key: cord-311204-fc12f845 authors: zhou, ling; tang, qinghai; shi, lijun; kong, miaomiao; liang, lin; mao, qianqian; bu, bin; yao, lunguang; zhao, kai; cui, shangjin; leal, élcio title: full-length genomic characterization and molecular evolution of canine parvovirus in china date: 2016-04-02 journal: virus genes doi: 10.1007/s11262-016-1309-y sha: doc_id: 311204 cord_uid: fc12f845 canine parvovirus type 2 (cpv-2) can cause acute haemorrhagic enteritis in dogs and myocarditis in puppies. this disease has become one of the most serious infectious diseases of dogs. during 2014 in china, there were many cases of acute infectious diarrhoea in dogs. some faecal samples were negative for the cpv-2 antigen based on a colloidal gold test strip but were positive based on pcr, and a viral strain was isolated from one such sample. the cytopathic effect on susceptible cells and the results of the immunoperoxidase monolayer assay, pcr, and sequencing indicated that the pathogen was cpv-2. the strain was named cpv-ny-14, and the full-length genome was sequenced and analysed. a maximum likelihood tree was constructed using the full-length genome and all available cpv-2 genomes. new strains have replaced the original strain in taiwan and italy, although the cpv-2a strain is still predominant there. however, cpv-2a still causes many cases of acute infectious diarrhoea in dogs in china. canine parvovirus type 2 (cpv-2) belongs to the genus protoparvovirus and the parvoviridae family and was first observed by electron microscopy in 1977 [1] . cpv-2 has three antigenic variants: types 2a, 2b, and 2c. two antigenic variants, cpv-2a and cpv-2b, are now distributed worldwide [2] . a third cpv-2 variant, which was initially believed to be a glu-426 mutant and subsequently renamed cpv-2c, was detected in italy in 2000 [3] and is now circulating there together with types 2a and 2b [4] [5] [6] . the new type 2c has also been reported in vietnam by nakamura et al. [7] , who developed monoclonal antibodies specific for that type [7] . type 2c has also been reported in the united states and south america [8] . the genome of cpv-2 is a single-stranded dna molecule that contains two open reading frames (orf1 and orf2). orf1 is located in the 5 0 region of the genome and encodes non-structural protein 1 (ns1) and 2 (ns2). orf2 is located in the right part of the genome and encodes the structural proteins vp1 and vp2. the middle region of the genome also contains a 500 bp sequence [9, 10] . the non-structural proteins ns1 and ns2 are associated with viral replication. ns1 functions as a nickase and helicase and can form a covalent edited by william dundon. ling zhou, qinghai tang, lijun shi, and miaomiao kong have contributed equally to this work. bond at the 5 0 end of dna. structural proteins vp1 and vp2 compose the nucleocapsid of cpv-2 [11] . in 2014 in china, many acute cases of infectious diarrhoea were reported in dogs. to identify the pathogen, we collected and examined faecal samples from symptomatic dogs. we report here that the disease is associated with a strain of parvovirus, and phylogenetic characterization of the causal isolate showed that isolates from china have long branch lengths; this may indicate an extensive process of accumulation of mutations and substitutions in chinese lineages. this study provides the basis for further exploration of the cpv-2 variation, the selection of vaccines, and the effective prevention and control of cpv-2 infections. faeces were collected from dogs exhibiting acute haemorrhagic enteritis. the faecal samples (10 g) were immersed in 10 ml of sterile phosphate-buffered saline (pbs) with antibiotics, and the suspension was centrifuged at 10,0009g for 15 min. the supernatant was filtered and frozen at -80°c. to determine whether cpv-2 was present in the samples, the filtered supernatants were added to the colloidal gold test strips using the bionote rapid test kit (korea, bio-note). the samples were mixed with buffer and then tested for the antigen at room temperature according to the manufacturer's instructions. the negative sample was prepared for pcr as follows. the pcr assay was performed in a 20-ll reaction volume that included 2.4 ll of template, 1.8 ll each of vp2-f and vp2-r (table 1) , 2.0 ll of dntps, 1.0 ll of kod fx neo polymerase (1 u/ll) (toyobo biotechnology company, shanghai, china), 5.0 ll of 2 9 pcr buffer for kod fx neo (toyobo biotechnology company, shanghai, china), and sufficient ddh 2 o to increase the volume to 20 ll. amplification was carried out in a pre-heated thermocycler (applied biosystems 2720 thermal cycler) as follows: one cycle at 94°c for 5 min; followed by 35 cycles at 94°c for 45 s, 50°c for 60 s, and 72°c for 60 s; and a final extension at 72°c for 10 min. amplicons were detected by electrophoresing 10-ll aliquots in 2 % agarose gels in 1 9 tae [40 mm tris-acetate (ph 8.0), 1 mm edta]. identification of cpv-2 with cell culture and ipma feline kidney cell line f81 was obtained from the american type culture collection, usa and was used to isolate viruses from clinical samples and to observe cytopathic effects associated with viral replication. the treated samples were inoculated at a concentration of 1 ml per 25 cm 2 flask in a cell monolayer. after adsorbing for 1 h at 37°c, the inoculum was removed, dmem with 2 % foetal bovine serum was added, and the cells were again incubated at 37°c. cell cultures were observed daily for 4-5 days to monitor the appearance of cytopathic effects (cpe). subsequently, an ipma was conducted, and the other flask was frozen at -20°c and submitted to further passages following the same procedure after freezing and thawing for three times, until the eventual appearance of cpe [3, 12] . cloning the full-length genomic sequence of cpv-2 dna and rna were extracted from the homogenized samples (faeces from infected dogs) with the tianamp virus dna/rna kit (beijing tiangen biotech company, beijing, china) according to the manufacturer's protocol. dna and rna were analysed by pcr or rt-pcr, and the presence of other potential pathogens, such as canine distemper virus (cdv), canine adenovirus (cav), canine coronavirus (ccv), and canine rotavirus (crv), was investigated. table 1 summarizes the sequences of primers used to amplify the virus, and the pcr was conducted as previously described [13] [14] [15] . these primers were the same set used in calderon et al. [13] , chaturvedi et al. [14] , and pratelli et al. [15] dna fragments were then cloned into the pmd18-t sample vector (takara biotechnology co., ltd. japan). the same set of pcr primers was used to sequence the full-length genome of the isolate ny-14. sequencing reactions were performed by a commercial corporation (huada, beijing). sequences were aligned using clustal x software version 2.1 [16] . after the first alignment, the sequences were manually edited to maintain the reading frames using the se-al programme version 2.0 (http://evolve.zoo.ox.ac.uk/ software/). the following cpv-2 references were used for the phylogenetic analyses: maximum likelihood trees and bootstrap values were obtained using phyml software [17] . the khy model and gamma distribution (c) was selected according to the likelihood ratio test (lrt) using jmodeltest software [18, 19] . the resulting trees were visualized and edited using figtree (http://tree.bio.ed.ac.uk/software/figtree/). to determine the extent of recombination in the cpv-2 sequences, rdp v.4 software [20] was used for analyses, which utilizes a collection of methods. an excellent and detailed explanation of each method implemented in the rdp programme can be found in the user's manual (http:// darwin.uvigo.es/rdp/rdp.html). the ill dogs showed acute infectious diarrhoea, and the faecal samples were bloody stools (fig. 1a) . the colloidal gold test strip showed positive (fig. 1b1) and negative ( fig. 1b2) results, while the pcr results were both positive (fig. 1c) . one viral strain was isolated from the faeces of dogs that tested negative in the rapid test but positive using pcr. cpe was observed in f81 cells after three passages. cpv-2-infected cell cultures showed vast regions of cell detachment (fig. 2) . the ipma was conducted with positive sera (fig. 3) . pcr amplification showed a fragment of 1755 bp corresponding to the vp2 gene of cpv-2. pcr amplification detected the presence of cpv-2 but was negative for other pathogens in the samples analysed. the sequencing results showed that the vp2 gene of cpv-ny-14 is 1755 bp long with no insertions or deletions in the coding region. the results were once again positive for cpv-2. three experimental animals were inoculated with ny-14 and exhibited clinical symptoms at 14 days post inoculation. cpv-2 was re-isolated from the faeces of the experimental dogs. the whole genome was sequenced, and the cpv-ny-14 complete genome structure was analysed and is shown in fig. 4 . a maximum likelihood tree was constructed using the vp2 sequence and full-length genomes of cpv-ny-14 and all available cpv-2 genomes. figure 4 shows that all the cpv-2 sequences from china clustered in a group, and the cpv-ny-14 isolate is also located in the chinese clade. interestingly, located at the base of the chinese clade is one isolate from russia (jn033694) sampled in 1993. all sequences used to construct the above-mentioned tree were absent for recombination. to further examine cpv-2 isolates, we used a larger dataset composed only of vp2 protein sequences. the maximum likelihood tree again showed a monophyletic cluster containing all cpv-2 isolates from china. cpv-ny-14 belongs to the cpv-2a type, and it was inferred that cpv-2a is still circulating in china and is also the main agent of acute haemorrhagic enteritis in dogs. in addition, both trees showed that isolates from china have long branch lengths; this may indicate an extensive process of accumulation of mutations and substitutions in chinese lineages. since it emerged in 1978, canine parvovirus has spread in the domestic and wild canine population, where it is continuously evolving into new adaptive viral variants. the variability and the intrinsically high mutation rate of the cpv-2 genome allowed the diversity of cpv-2 to rapidly increase as the virus spread through canine populations [21] . because some viral variants replicate more successfully than others, the virus population changes over time [22] . cpv-2a and cpv-2b are the best examples of variants with a fast spread and replacement capacity since the emergence of cpv-2 [9, 23] . during 2014, there were many cases of acute infectious diarrhoea in dogs. some samples tested negative for the cpv-2 antigen by a colloidal gold test strip, while they were positive for cpv-2 by pcr. one viral strain was isolated from the faeces of dogs that tested negative in the colloidal gold test strip but positive with pcr. cpv-ny-14, which was described in this study, was found in dogs that exhibited acute haemorrhagic enteritis. based on the pcr results, the only virus consistently detected in the samples was cpv-2. some samples tested with the serology method (colloidal gold test strip) failed to detect the virus, while pcr was able to detect cpv-2, which is a valid and important objective. this may be due to differences in the test sensitivity (pcr is more sensitive than the serological test) or due to a change in the viral genome that leads to a major change in the type of the antigen (viral protein) and the failure of the antibodies to recognize the viral antigen in the serology test. this question needs to be addressed by future experiments. the isolated virus was then cultivated in f81 cells, and we next inoculated adult dogs with the isolated cpv-2. these animals presented clinical conditions characteristic of cpv-2 infection. pcr was used to demonstrate that cpv-2 was the agent causing the disease in these experimental animals. cpv-2 is pandemic, and the frequencies of the different antigenic types of cpv-2 vary in different countries. in uruguay, for example, cpv-2c is the major epidemic strain [24] . in the usa and southern africa, cpv-2b is the main viral type that causes most outbreaks of cpv-2 infection [23, 25] . in the uk, both cpv-2a and cpv-2b are present, and germany and spain have similar frequencies of isolation [26, 27] . in india, the major epidemic strains are cpv-2a and cpv-2c. the new strains have also replaced the original one in taiwan and italy, although the cpv-2a strain is still predominant there [28, 29] . however, cpv-2 still causes many cases of acute infectious diarrhoea in dogs in japan [30] . in this study, molecular phylogenetic analysis of cpv-ny-14 and other cpv-2 isolates in genbank revealed that cpv-ny-14 is closely related to isolates s5, sc02-11, lz1, lz2, and nj01-06, especially isolate lz2. cpv-2a still causes many cases of acute infectious diarrhoea in dogs in china. the sequence analysis showed that there is little variation, and this is important in choosing a vaccine to prevent this disease. given the results obtained in the current study, the continuous surveillance of cpv-2 in china is imperative for determining whether cpv-2a will colonize and spread into new territories. diarrhea in puppies: parvovirus-like particles demonstrated in their feces clustal w and clustal x version 2.0 proc. natl. acad. sci. usa acknowledgments this work was partly supported by the agricultural science and technology innovation programme of china (astip-ias15). conflict of interest the authors declare that they have no competing interests. key: cord-266634-bww62vx8 authors: gopinath, m.; shaila, m. s. title: evidence for n(7) guanine methyl transferase activity encoded within the modular domain of rna-dependent rna polymerase l of a morbillivirus date: 2015-10-07 journal: virus genes doi: 10.1007/s11262-015-1252-3 sha: doc_id: 266634 cord_uid: bww62vx8 post-transcriptional modification of viral mrna is essential for the translation of viral proteins by cellular translation machinery. due to the cytoplasmic replication of paramyxoviruses, the viral-encoded rna-dependent rna polymerase (rdrp) is thought to possess all activities required for mrna capping and methylation. in the present work, using partially purified recombinant rna polymerase complex of rinderpest virus expressed in insect cells, we demonstrate the in vitro methylation of capped mrna. further, we show that a recombinant c-terminal fragment (1717–2183 aa) of l protein is capable of methylating capped mrna, suggesting that the various post-transcriptional activities of the l protein are located in independently folding domains. the presence of cap structure at the 5 0 end of mrna prevents the mrna from degradation by cellular rnases and also plays an important role in the translatability of mrna [5] . this di-nucleotide structure is also methylated to various extents in different organisms, and methylation of the first base at n 7 position of guanine residue results in cap 0 structure; methylation of the penultimate base at 2 0 hydroxyl group results in cap 1 structure. cellular mrna capping and methylation occur by an orderly series of events carried out by rna triphosphatase, guanylyl transferase, n 7 guanine methyl transferase and 2 0 -o-methyl transferase, respectively [for a detailed review see, ref. 10] . paramyxoviruses constitute a group of viruses with single-stranded negative sense rna genome that includes potential pathogens to humans and domestic live stocks. the viral genome consists of a *16 kb long negative sense rna encapsidated by nucleocapsid protein (n-rna). transcription of viral n-rna occurs in an orderly fashion from the 3 0 end of n-rna; 3 0 -le-n-p-m-f-h-l-tr-5 0 . excluding the 52 nt 3 0 leader rna, all the other mrnas are capped and methylated similar to cellular mrna. viruses of this family replicate inside the cytoplasm of the infected cells and hence, are not dependent on the host enzymes located in the nucleus for the post-transcriptional modification of viral mrna. during transcription, the viral mrnas are capped similar to the cellular mrna, although the extent of methylation differs within the viruses belonging to this family. the direct evidence for the belief that the large protein l of paramyxoviruses is responsible for mrna synthesis, capping and cap methylation came from the work of ogino et al. [12] who showed that the recombinant l protein of sendai virus possesses guanine-7methyl transferase activity located within the c-terminal part of l protein. multiple sequence alignment and secondary sequence analysis predicted the presence of 2 0 -o-methyl transferase domain in c-terminal domain of l protein of mononegavirale [6] . rinderpest virus (rpv) is an important member of the morbillivirus genus, in the paramyxoviridae family. we have earlier shown that in rpv-infected cells, the viral mrna is capped and has both cap 0 and cap 1 structures [8] . in addition, we also demonstrated the guanylyl transferase activity of l protein in vitro. in the same study, domain mapping revealed the ability of a truncated l protein (ld3-aa 1717-2183) to catalyse the first step of guanylyl transferase activity; vis-a-vis formation of a covalent complex with gmp. in the present work, we present evidence for n 7 guanine methyl transferase activity of rpv l protein and further demonstrate that this activity is localized to aa 1717-2183 of l protein indicating the modular nature of the rdrp. spodoptera frugiperda (sf21) insect cells were cultured and maintained as described earlier [8] . generation of recombinant baculoviruses expressing rpv l (full length), p and domain iii (ld3, aa 1717-2183) has been described earlier [7, 8] . partial purification of l-p complex from insect cells infected with recombinant baculoviruses expressing rpl l and p proteins has been reported earlier [7] . rpv ld3 protein was purified from the insoluble fraction of respective baculovirus-infected cells using high salt extraction as described previously [8] . generation of 6b capped mrna substrate for methyl transferase assay for in vitro methyl transferase assay, cap-labelled mrna substrate was prepared as described earlier [8] , except that the substrate was a consensus 6 nt sequence representing rpv viral mrnas (fig. 1c) . the following primer pair with a t7 promoter sequence was used for in vitro transcription followed by capping with vaccinia virus guanylyl transferase; for: 5 0 gat cct tat agt gag tcg tct ta-3 0 , rev: 5 0 -taa tac gac tca cta ta. for in vitro methyl transferase assay, the cap-labelled rna substrate (5000 cpm) was incubated with the indicated concentrations of enzyme source in methylation buffer containing 25 mm hepes-koh, ph 7.2, 1 mm dtt, 10 mm nacl, and 10 u of human placental rnase inhibitor and 50 lm of s-adenosyl methionine (sam) in a total reaction volume of 5 ll. after incubation for 2 h at 30°c, the total reaction mix was adjusted to 50 mm sodium acetate ph 5.2, 5 mm mgcl 2 and 5 lg of nuclease p1 in a total volume of 10 ll and incubated at 55°c for 1 h. 5 ll of the reaction products was spotted onto a pei-tlc sheet and subjected to chromatography with 0.45 m ammonium sulphate as the solvent system. cap structure analogues (gpppa, 7 m gpppa and 7 m gpppa m ) were run in parallel and detected by uv shadowing. in our previous study, using in vitro reconstituted transcription with purified rpv virions, the viral mrna was found to possess cap 1 structure indicating a viral-encoded capping enzyme [8] . further, the virion-associated capping activity was localized to l protein [8] . sequence alignment of rpv l protein with 2 0 -o-methyl transferase from other species revealed the conservation of kdke tetrad suggesting the presence of this motif in domain iii of rpv l protein (fig. 1a) . in addition, we also found the s-adenosyl methionine (sam) binding motif gxgxg within residues 1789-1795 conserved across morbillivirus genus (fig. 1b) . considering the presence of both kdke tetrad, responsible for 2 0 -o-methyl transferase as well as gxgxg motif for sam substrate binding, domain iii could likely represent the methyl transfer module (both n 7 -guanine and 2 0 -omethyl) of rpv l protein. this is in agreement with other studies with vsv as well as in sendai virus where the methyl transferase activity was mapped to c-terminal half of l protein [11, 12] . to investigate the role of l-p complex in viral mrna cap methylation, a 6b rna template representing the first 6b consensus sequence of all species of rpv viral mrna (aggauc) was synthesized in vitro using t7 rna polymerase (fig. 1c) . the viral rna was capped using vaccinia virus guanylyl transferase enzyme. capped rna was seen to co-migrate with xylene cyanol marker and the unused [a-32 p] gtp was seen near the bromophenol blue position (fig. 2a, lane 1) while a reaction lacking the guanylyl transferase enzyme showed only the gtp (lane 2). this was further gel purified and used as a substrate for in vitro methyl transferase assay. we have earlier partially purified transcriptionally active and capping competent l-p complex from insect cells using glycerol gradient fractionation. given the high molecular weight and oligomerization nature of l-p complex, only high-density glycerol fractions contained both l and p proteins, which is usually devoid of insect or virus genes (2015) 51:356-360 357 baculoviral methyltransferase activity [8] . to test, if rpl l protein possesses methyltransferase activity in addition to capping, we incubated 6b capped viral rna with partially purified l-p complex. digestion of the substrate alone with p1 nuclease released a product, which co-migrated with the gpppa (cap) marker (fig. 2b, labelled as c) . incubation of from genbank and subjected to clustalw analysis and viewed with gsview 8.14. the kdke tetrad is marked by asterisks. alpha helices and beta sheets are marked by bars and arrow marks, respectively. b alignment of l proteins from morbillivirus genus shows the conservation of sam binding motif gxgxg (in bold). c sequence alignment of the 5 0 ends of rpv viral mrnas. only the first eight bases are shown. consensus sequence between the viral mrnas is given in bold the capped rna with the partially purified l-p complex from insect cells resulted in a concentration dependent n 7 guanine methylation of 6 bp substrate (fig. 2b , marked as rl) which was not detected in a mock-purified high-density fraction from insect cells infected with non-recombinant baculovirus (fig. 2b, mock) . however, higher concentrations of rl led to the appearance of a slower migrating spot, likely due to the increase in glycerol concentration present in the reaction mix, leading to aberrant migration of m7gpppa. further, to functionally validate the methyl transferase activity of domain iii (aa 1717-2183, ld3) of rpv l protein, ld3 was purified from insect cells using metal chelate affinity chromatography as described earlier [8] . figure 2c shows the purity of recombinant ld3 protein in eluted fraction (lane 4). incubation of ld3 alone was able to catalyse the n 7 guanine methylation of a 6 bp cap-labelled substrate in dose-dependent manner suggesting the presence of n 7 methyl transferase domain within this region (fig. 2d) . however, no products were observed, co-migrating with 7 m gpppa m indicating the lack of 2-o-methyl transferase activity with domain iii or with l-p complex in our preparation. though l protein is believed to possess all the activities required for the post-transcriptional modification of the viral mrna, due to its size, it has been proposed to fig. 2 a 6b function in a modular fashion to carry out different enzymatic activities associated with viral mrna synthesis and maturation [3] . in agreement, putative 2 0 -o-methyl transferase (mtase) motif was predicted within domain vi (1753-1830 aa) of l proteins [1] . in another report, a structural homology-based comparison was carried out between bacterial 2 0 -o-mtase, rrmj and the region spanning 1644-1842 aa of vsv l protein, and further mutational analysis revealed the importance of this region in viral mrna transcription as well as methylation [6] . however, recent evidences point out the importance of regions in domain ii of vsv l protein in both cap 0 and cap 1 methylation [9] . in the present study, we have shown that rpv l domain iii alone could catalyse the methylation of gpppa which obviates the need of domain ii for cap 0 methyl transferase activity. in support of this observation, ogino et al. [12] have shown that sendai virus l protein deletion mutant spanning the domain iii alone (aa 1756-2228) catalyses cap 0 methyl transferase activity, while inclusion of a portion of domain ii (aa 1121-2228) resulted in significantly higher activity. these results suggest that in paramyxovirus l protein (compared to rhabdo viruses), the catalytic module for cap 0 methyl transferase activity resides in domain iii, and domain ii may have additional role of stabilizing the enzyme or increase the catalytic efficiency. we provide evidence for the modular nature of rpv l protein in terms of domain iii alone participating in viral mrna cap methylation. although the rpv l protein was found to possess the kdke motif, the catalytic motif for 2 0 -o-methyl transferases, the generation of cap 1 (7 m gpppa m ) product could not be seen. one likely reason could be that the presence of cap 0 is a mandatory prerequisite for rpv l protein to generate cap 1 structures. in support of this, coronavirus nonstructural protein 16 was found to exhibit 2 0 -o-methyl transferase activity only on n 7 gpppa substrate rna, while flavivirus ns5 methyl transferase can catalyse the methylation of both gpppa and n7gpppa substrates [2, 4] . alternatively, lack of domain ii may render domain iii catalytically inactive with respect to 2 0 -o-methylation [9] . hence, it would be interesting to speculate that rpv l protein may also require specific n 7 gpppa substrate rna to exhibit 2 0 -o-methyl transferase activity although further experiments are needed to confirm this hypothesis. in silico identification, structure prediction and phylogenetic analysis of the 2 0 -o-ribose (cap 1) methyl transferase domain in the large structural protein of ssrna negative-strand viruses coronavirus nonstructural protein 16 is a cap-0 binding enzyme possessing (nucleoside-2 0 -o)-methyltransferase activity independent structural domains in paramyxovirus polymerase protein structural and functional analysis of methylation and 5 0 -rna sequence requirements of short capped rnas by the methyltransferase domain of dengue virus ns5 viral and cellular mrna capping: past and prospects analysis of a structural homology model of the 2 0 -o-ribose methyl transferase domain within the vesicular stomatitis virus l protein recombinant l and p protein complex of rinderpest virus catalyses mrna synthesis in vitro rna triphosphatase and guanylyl transferase activities are associated with the rna polymerase protein l of rinderpest virus identification of a new region in the vesicular stomatitis virus l polymerase protein which is essential for mrna cap methylation processing the message: structural insights into capping and decapping mrna a unique strategy for mrna cap methylation used by vesicular stomatitis virus sendai virus rna-dependent rna polymerase l protein catalyses cap methylation of virus-specific mrna acknowledgments this study was supported in part by a grant-inaid for research from the council for scientific and industrial research, new delhi, india, under the emeritus scientist scheme. key: cord-263489-i4tkdgy4 authors: suo, siqingaowa; wang, xue; zarlenga, dante; bu, ri-e; ren, yudong; ren, xiaofeng title: phage display for identifying peptides that bind the spike protein of transmissible gastroenteritis virus and possess diagnostic potential date: 2015-05-27 journal: virus genes doi: 10.1007/s11262-015-1208-7 sha: doc_id: 263489 cord_uid: i4tkdgy4 the spike (s) protein of porcine transmissible gastroenteritis virus (tgev) is located within the viral envelope and is the only structural protein that possesses epitopes capable of inducing virus-neutralizing antibodies. among the four n-terminal antigenic sites a, b, c, and d, site a and to a lesser extent site d (s-ad) induce key neutralizing antibodies. recently, we expressed s-ad (rs-ad) in recombinant form. in the current study, we used the rs-ad as an immobilized target to identify peptides from a phage-display library with application for diagnosis. among the 9 phages selected that specifically bound to rs-ad, the phage bearing the peptide tlnmhlfpfhtg bound with the highest affinity and was subsequently used to develop a phage-based elisa for tgev. when compared with conventional antibody-based elisa, phage-mediated elisa was more sensitive; however, it did not perform better than semi-quantitative rt-pcr, though phage-mediated elisa was quicker and easier to set up. transmissible gastroenteritis virus (tgev) is a member of the coronaviridae family and is a major cause of enteric disease in pigs where it threatens swine production and triggers substantial economic losses in the industry [1] [2] [3] [4] . its genome is composed of positive-stranded rna approximately 28.5-kb in length. the virus consists of four structural proteins: envelope (e), membrane (m), spike (s), and nucleocapsid (n) proteins [1, 3, 5] . non-structural proteins, which comprise two-thirds of the 5 0 -proximal end, are encoded by open reading frames 1a and 1ab as well as the replicase. in contrast, the 3 0 end of the genome encodes both non-structural and structural proteins (5 0 -s-3a-3b-e-m-n-7-3 0 ) [6] . the s protein, which induces neutralizing antibodies, is important in the initiation of infection [7] [8] [9] and has been further delineated into four antigenic sites a, b, c, and d which are located within the n-terminal region of the s protein [8] . among these, only site a and to a lesser extent site d (herein defined as s-ad) are involved in eliciting neutralizing antibodies. recent work demonstrated that recombinant s-ad (rs-ad) was able to induce antibodies capable of neutralizing tgev infection in vitro [10] . edited attenuated or inactivated tgev vaccines are less than optimal because they are capable of reverting back to virulent phenotypes and generally do not prevent viral shedding. therefore, effective diagnostic tests have become important in virus management and control. phage display is a proven technology for identifying small peptide ligands that can bind specific target proteins [11] [12] [13] [14] . it has been utilized in antibody engineering [15] , drug discovery [16] , vaccine development [17] , and molecular diagnosis. in virology, phage display has been used to identify peptides that interact with several viruses such as bovine rotavirus [18] , adenovirus type 2 [19], andes virus [20] , sin nombre virus [21] , coronavirus [22] , and avian h5n1 virus [23] . herein, we use similar technology and advance previous work by using the rs-ad as an immobilizing target to select phages from a peptide display library, with diagnostic potential for tgev. our results indicate that phages bearing peptide ligands that bind rs-ad can be used to develop a phage-mediated elisa with high sensitivity and specificity to distinguish tgev from other common swine viruses. biopanning swine testis (st) cells were purchased from atcc and used to propagate tgev strain pur46-mad [4] . the rs-ad was produced and purified as described elsewhere [10] . a 12-mer phage-display library was purchased from new england biolabs for panning according to published protocols [11, 14, 24] using the rs-ad as a target at a concentration of 10 lg/well. the 96-well plates coated with rs-ad, were initially incubated with the phage library (1.5 9 10 11 pfu/ml; 100 ll/well) suspended in tbst (50 mm tris-hcl [ph 7.5], 150 mm nacl, 0.05 % tween-20) for 30 min. subsequent pannings 2, 3, and 4 were performed using incrementally higher concentrations of tween-20. the phage titers of the input, output (elution), and amplified phages were determined as defined by the manufacturer. indirect elisa was used to assess the phages that remained after four rounds of biopanning. either tgev (0.61 mg/ml) or rs-ad (10 lg/well) in 0.1 m nahco 3 ph 8.6 was used to coat 96-well plates at 4°c for 12 h. the next day, the plates were blocked with 1 % bovine serum albumin (bsa) in tbs (tbsb) for 2 h, washed (39) with tbst, and then incubated with phage (1.5 9 10 12 pfu/ml in 0.1 m nahco 3 , ph 8.6; 100 lg/well) for 1 h at 37°c. the plates were again washed with tbst, then incubated for 1 h at 37°c with rabbit anti-m13 antibody (1:1000 in tbsb; abcam), followed by horseradish peroxidase (hrp)-conjugated goat anti-rabbit igg antibody (garp; 1:5000 in tbsb, sigma). the od 490 nm was determined in triplicate as previously described [24] . ten phages with the highest affinity for binding rs-ad as determined by elisa were amplified, precipitated with polyethylene glycol-nacl, and then used for dna extraction according to the manufacturer's instructions (new england biolabs). amplification of the genes encoding the exogenous peptides was performed using sense (5 0 -tcacctcgaaagcaagctga) and anti-sense (5 0 -ccctcatagttagcgtaacg) m13 primers followed by dna sequencing [14, 24] . the pcr conditions were as follows: 95°c for 5 min, 30 cycles of 95°c for 30 s, 57°c for 30 s, 72°c for 30 s, and a final extension at 72°c for 7 min. to compare the sensitivities of phage-mediated elisa to antibody-mediated elisa, tgev serially diluted in 0.1 m nahco 3 (ph 8.6) was coated onto duplicate elisa plates overnight at 4°c followed by blocking with 5 % skim milk for 3 h at rt. the selected phages or unbound phage complexes (negative control) diluted in pbs (1.5 9 10 12 pfu/ml) were added to one set of plates, followed by anti-m13 antibody (1:1000 in pbs ? bsa). to the second set of duplicate plates, rabbit anti-tgev polyclonal antiserum serially diluted in pbs ? bsa, and normal rabbit serum were added as the primary and control antibodies, respectively. after incubating both sets of plates for 1 h at 37°c followed by extensive washing, garp (1:5000) was added as described above. the od 490 values were read on all plates; od 490 ratios where od 490 (sample-negative standard) (p)/od 490 (positive control-negative standard) (n) [ 2 were judged as positive. all experiments were performed in triplicate. the tcid 50 of tgev was determined using the reed-muench method, and tgev was adjusted to 0.61 mg/ml in pbs. total rna was extracted from 300 ll of virus (fastgene, china) and dissolved in 20 ll of sterile water. reverse transcription was performed in 20 ll using 2 ll of rna (550 ng/ll), oligo dt as primer, and m-mlv reverse transcriptase as recommended by the manufacturer (takara, china). the resulting cdna (1 ll) was used as a template for pcr in 20 ll which included 0.2 ll of 109 easy taq polymerase (takara, china), 1 ll of dntp (2.5 mm), 109 pcr buffer (1 ll), and 0.2 ll each of sense (5 0 -cttagtagtaatattttgcatac) and antisense (5 0 -tatagcagatgatagaattaaca) primers. amplification conditions were as follows: 94°c for 5 min, then 30 cycles of 95°c 30 s, 47.6°c 30 s, and 72°c 40 s followed by a final extension at 72°c for 10 min. the amplified fragment was confirmed by dna sequencing. phage specificity was evaluated against the following panel of porcine viruses: tgev, strain hr/dn1 [25] , porcine epidemic diarrhea virus (pedv; strain hljby) [26] , porcine reproductive and respiratory syndrome virus (prrsv; strain jilintn1) [27] , porcine circovirus type ii (pcv2; strain pcv2-ljr) [28] , porcine parvovirus (ppv; strain ppv2010) [29] , porcine pseudorabies virus (prv; strain kaplan) [23, 30] , and porcine rotavirus (prov; isolate dn30209) [31] . all viruses were initially coated at 8 lg/ml then serially diluted in 0.1 m nahco 3 (ph 8.6) and subjected to phage-elisa as described above. average od 490 values were obtained from three independent experiments. data were collated and the mean ± sd values were determined. arithmetic means were compared between treatment groups using anova (spss 15.0; spss inc., chicago, illinois, usa) followed by duncan's multiplerange test. values of p \ 0.05 and p \ 0.01 were defined as statistically significant (''*'') or highly significant (''**''), respectively. in this study, we used phage display to select 12-mer peptides that bind rs-ad [12] and that may function for diagnosing tgev infections. after four rounds of panning, rs-ad-specific phages increased 119 from 4.7 9 10 4 in the first round to 5.3 9 10 5 in the fourth round (table 1) . following the last screen, we selected 10 phage clones from the original 18 that bound both rs-ad and tgev. this subset was characterized by elisa with respect to their binding efficiencies (fig. 1) . pcr amplification and sequencing indicated that nine distinct 12-mer peptides were identified among the 10 phages that were selected ( table 2 ). in contrast to previous reports [14, 22, 24] , these peptides exhibited substantial sequence diversity in the number of peptides that bound to rs-ad. it is not known if this relates to the length of the target protein or to changes made in the panning process to enhance binding specificity. as shown in fig. 2 , we selected four (phtgev-sad-15, phtgev-sad1/7, phtgev-sad11, phtgev-sad16) of the ten phages with the highest binding affinity to tgev for further testing. the lowest detectable quantity of tgev for the above defined phages was 0.1, 0.3, 0.2, and 0.4 mg, respectively, suggesting that phtgev-sad15 was the most sensitive when used in a phage-based elisa. binding directly to tgev was uncharacteristically better than binding to the rs-ad used in the selection process (figs. 1, 2) . this is likely attributable to more complete folding of the native protein or to better accessibility of the binding epitope in the native form. the minimum quantity of tgev required for detection via antibody-based elisa was 0.6 lg (p/n value [ 2) (fig. 3) , whereas the minimum quantity of tgev required for phtgev-sad15-based elisa was 0.1 lg. this is consistent with the phage-mediated elisa being more sensitive than conventional antibody elisa. a number of elisa-based assays have been developed over the years for detecting tgev, many of which have been directed at differentiating tgev from prcv-infected animals. among the earlier ones, sestak et al. [32] targeted the s glycoprotein of tgev in a competition elisa where recombinant s protein was coated onto plates and used to capture host antibodies. using a monoclonal ab to epitope d and which is specific for tgev, the investigators were able to differentiate the infectious agents. liu et al. [33] cloned and expressed the nucleoprotein (n) to develop an elisa. compared to the virus neutralization assay, they demonstrated 98 % sensitivity and specificity; however, they did not characterize or address the lower level of sensitivity in vitro or in vivo. in 2010, elia et al. [34] used the recombinant s protein to develop an elisa to assess swine-like tgev coronaviruses in canine hosts. given the novelty of the virus, they were unable to compare it to other assays currently in use. zou et al. [35] use techniques similar to those developed here, i.e., peptide display, to target the m protein of tgev in developing an elisa-based diagnostic test. in this case, the sensitivity of the elisa exceeded that when the phage-mediated elisa and antibody elisa were compared to rt-pcr which targeted a 208-base pair fragment of the s gene, the rt-pcr was most sensitive of all assays tested. this is not unexpected given the higher sensitivity of pcr assays in general. pcr amplification was positive using cdna equivalents of 0.02 lg of tgev (data not shown). real-time pcr and/or nested pcr would clearly have generated even more sensitive results. in addition, phages expressing peptide that bind to tgev s-ad did not bind to other selected viruses (fig. 4) . table 2 sequences of tgev rs-ad peptides. predicted amino acid sequences were generated for ten selected phages in summary, we identified peptides that specifically bind to tgev and can form the basis of new diagnostic tests where the sensitivity of phtgev-sad15 was 0.1 lg of tgev. this sensitivity fared quite well when compared to the antibody-mediated elisa which had a sensitivity of 0.6 lg but fell short of the sensitivity of rt-pcr; however, phtgev-sad-15 provides a quicker and less costly alternative to rt-pcr. diseases of swine 7th ed the authors declare no conflicts of interest. key: cord-257122-h3zi8k8g authors: lin, chao-nan; chang, ruey-yi; su, bi-ling; chueh, ling-ling title: full genome analysis of a novel type ii feline coronavirus ntu156 date: 2012-12-14 journal: virus genes doi: 10.1007/s11262-012-0864-0 sha: doc_id: 257122 cord_uid: h3zi8k8g infections by type ii feline coronaviruses (fcovs) have been shown to be significantly correlated with fatal feline infectious peritonitis (fip). despite nearly six decades having passed since its first emergence, different studies have shown that type ii fcov represents only a small portion of the total fcov seropositivity in cats; hence, there is very limited knowledge of the evolution of type ii fcov. to elucidate the correlation between viral emergence and fip, a local isolate (ntu156) that was derived from a fip cat was analyzed along with other worldwide strains. containing an in-frame deletion of 442 nucleotides in open reading frame 3c, the complete genome size of ntu156 (28,897 nucleotides) appears to be the smallest among the known type ii feline coronaviruses. bootscan analysis revealed that ntu156 evolved from two crossover events between type i fcov and canine coronavirus, with recombination sites located in the rna-dependent rna polymerase and m genes. with an exchange of nearly one-third of the genome with other members of alphacoronaviruses, the new emerging virus could gain new antigenicity, posing a threat to cats that either have been infected with a type i virus before or never have been infected with fcov. electronic supplementary material: the online version of this article (doi:10.1007/s11262-012-0864-0) contains supplementary material, which is available to authorized users. feline coronaviruses (fcovs) are large, enveloped, positive-strand rna viruses with a genome of approximately 29,200 nucleotides [1] [2] [3] . the fcovs belong to the genus alphacoronavirus, family coronaviridae, order nidovirales. other members of this subgroup include canine coronavirus (ccov), transmissible gastroenteritis virus (tgev), raccoon dog cov (rdcov/gz43/03), and chinese ferret badger cov (cfbcov/dm95/03) [4] . fcovs are associated with diseases that range from subclinical and/or mild enteric infections to fatal infectious peritonitis [5] . despite the high prevalence of fcovs in feline populations around the world, only 5-12 % of seropositive cats develop feline infectious peritonitis (fip). fip is a chronic, progressive, immune-mediated disease in domestic and nondomestic fields. the typical histopathological finding of this disease is systemic perivascular necrotizing pyogranulomatous inflammation [6] . two serotypes that differ in their growth characteristics in tissue culture and in their genetic relationship to ccov and tgev have been identified [7, 8] . type ii fcov is significantly correlated with fip when compared to type i viruses [9, 10] . however, unlike the ubiquity of type i fcov, infection by type ii virus encompasses only a small percentage of the total number of fcovseropositive cats in different studies [9] [10] [11] [12] [13] [14] [15] [16] . type ii fcov is estimated to have diverged from alphacoronavirues in 1953 [4] . based on partial genomic sequence analysis, type ii fcovs were suggested to result from a double recombination between type i fcov and ccov [17] . like most rna viruses, covs mutate at a high frequency due to the high error rate of rna polymerization. in addition, a unique feature of cov genetics is the high frequency of rna recombination in the natural evolution of this virus [18] . recombination among covs is an attribute of the genus and is thought to contribute to the emergence of new pathotypes, such as severe acute respiratory syndrome cov [19, 20] , human cov nl63 (hcov nl63) [21] , hcov hku1 [22] , and avian infectious bronchitis virus (ibv) [23] [24] [25] [26] . to gain better evolutionary insight into type ii fcovs, we analyzed the complete genome of a novel type ii fcov isolate. taking our data together with data from other strains, we discuss the evolution of type ii fcov. virus and isolation of viral rna fcov ntu156 was isolated in 2007 from a kitten with naturally occurring fip by the co-cultivation of pleural effusion with feline fcwf-4 cells [27] . after three rounds of purification by limited dilution, the virus was propagated and titrated. all of the viruses used in this study for the sequencing of complete genome came from a stock virus passaged six times. ntu156 is relatively fast-growing, induces a coronavirus-typical syncytial cytopathic effect and is a type ii fcov [10] . eleven microliters of isolated rna was added to the premix, consisting of 4 ll of 59 rt buffer, 2.5 mmol dntps (geneteks bioscience, inc., taipei), 50 pmol random primer, 0.2 mol dithiothreitol, and 1 ll of 200 u moloney murine leukemia virus reverse transcriptase (invitrogen, ca, usa) in a 0.6 ml reaction tube. this reaction mixture was then briefly centrifuged and incubated at 37°c for 60 min, then at 72°c for 15 min, and finally at 94°c for 5 min. a total of 120 primers for pcr were chosen from a relatively conserved region of the fcov genome. following reverse transcription, 1 ll of the rt reaction mixture was added to 49 ll of the pcr mixture, which consisted of 5 ll of 109 taq buffer, each primer (10 pmol), dntp (2.5 mmol), 2 u of taq dna polymerase (geneteks, bioscience, inc., taipei), and 39 ll of 0.1 % depc water. an abi-2720 thermal cycler (applied biosystems, usa) consisted of 3 min of preheating at 94°c, followed by 35 cycles of denaturation at 94°c for 30 s, annealing at 55°c for 30 s, and extension at 72°c for 1 min with a final extension at 72°c for 7 min. the viral rna termini were amplified using 3 0 -and 5 0 -rapid amplification of cdna ends (invitrogen, usa). analysis of pcr-amplified products and sequencing a total of 10 ll of pcr products from each pcr mixture was analyzed using a 1 % agarose gel (viogene, taipei) for electrophoresis. amplification products were visualized using uv illumination after ethidium bromide staining. the nucleotide sequences of the targeted dna fragments were purified (geneaid biotech, taipei) and sequenced in both directions using an auto sequencer (abi 3730xl, usa). full-length genome sequencing of ntu156 was performed by single-round pcr with a set of overlapping pcr products (average size 750 bp) that encompassed the entire genome. the complete sequences of ntu156 were then compared with other alphacoronaviruses and the results are summarized in table 1 . multiple alignments of nucleic acid sequences were performed by the clustal w method using the megalign program (dnastar inc., wi, usa). phylogenetic analyses were conducted using mega, version 4.0. similarity graphs were prepared with simplot 2.5 software [28] . potential recombination sites were identified using the recombination detection program (rdp) [29] . the full genomic rna sequence of fcov ntu156 comprises 28,897 nucleotides (nts), excluding the 3 0 polyadenylated nts. sequence analysis revealed that ntu156 contains conserved open reading frames with an overall genome organization similar to known fcovs ( table 2 ). the overall nucleotide composition is as follows: a, 29.3 %; c, 17.3 %; g, 21.0 %; and t, 32.4 %. the g?c content is 38.3 %. ntu156 possesses the putative transcription regulatory sequence (trs) motif, 5 0 -cuaaac-3 0 , at the 3 0 end of the leader sequence and preceding each orf ( table 2) . table 2 ). the two strings of accessory genes identified in all of the known fcovs, i.e., orf 3ab and orf 7ab, were found in ntu156 as well (table 2) . however, an in-frame deletion of 442 nucleotides in orf 3c was identified, which resulted in a relatively short gene comprising only 201 nts. the overall sequence comparison revealed that ntu156 was more closely related to known subgroup 1a cov but not 1b within alphacoronaviruses (fig. 1) . nucleotide sequences similarity graphs of ntu156 with known type i fcovs, ccovs, and tgevs were created by the simplot software. the results showed that ntu156 was more closely related to type i fcovs from the 5 0 end of the genome to position 15,000 and from position 27,500 to 3 0 -utr (fig. 2) . genes located at the 5 0 end (nsp1-11) and 3 0 end (the n gene through orf7) of ntu156 show consistently high similarity to type i fcovs, whereas from nsp13 through the e sequence, the similarity to canine and porcine covs varies dramatically (fig. 3) . these data indicate that ntu156 might have arisen from recombination events between different strains of covs from species other than cats. two possible recombination sites, at approximate positions 14,300 and 27,300, corresponding to the rnadependent rna polymerase (rdrp) and the m gene, respectively (fig. 4) , were further analyzed. phylogenetic trees using the nucleotide sequence of genes for putative proteins and polypeptides of alphacoronaviruses were further constructed. at nsp 1 through nsp 11 ( supplementary fig. 1a ) and from n gene (supplementary fig. 1b) to the orf 7ab gene ( supplementary fig. 1c) , ntu156 was more closely related to type i fcovs. at the nsp 12 (rdrp) and the m gene, ntu156 was not clustered with any known alphacoronavirues (fig. 3a, f) . from the nsp13 through the e gene, ntu156 was clustered with ccov ( fig. 3b-e) . taken together, these data indicated that ntu156 might have evolved from two recombination events with ccov, with the sites of recombination located in the rdrp and m genes. a unique feature of cov genetics is the high frequency of rna recombination both in vivo and in vitro [18] . here, an interspecies recombination between feline and canine cov was identified in a viral strain ntu156, which was isolated from the pleural effusion of a fip cat. this is the first time that evidence for natural recombination has been documented through the complete genome sequence analysis of type ii fcov. in 1998, herrewegh et al., based on partial sequence analysis, first determined that type ii fcovs 79-1146 and 79-1683 originated from a homologous rna recombination event between type i fcov and ccov [17] . the complete sequence of strain 79-1146 was later published in 2005 [1] . when comparing strain 79-1146, 79-1683, df-2, and our ntu156 strain, the only four type ii fcovs that have had their full-length genomes sequenced to date, a common phenomenon was found; both viruses arose from a double recombination event between type i fcov and coronaviruses from other species. ntu156 appears to have evolved from a recombination between type i feline and canine cov; however, when we aligned the genes in which recombination took place with other type ii fcovs, genome crossovers with other alphacoronavirus were noted. when the sequences around the putative recombination sites were examined, i.e., one located in the 5 0 region (strain 79-1146) and two in the 3 0 region (strain 79-1146, and 79-1683), porcine coronavirus (tgev) was also found to contribute to the evolution of type ii fcov (in addition to ccov) (figs. 3b, 5) . this finding is not surprising because the receptor for type ii fcov, feline aminopeptidase n, has been found to serve as a receptor for several covs, including canine, porcine, and human covs [30] . therefore, the interspecies recombination of type i fcov with any of the above viruses might occur in nature. the recombination of type i fcov in cats analysis was performed using mega 4 software and neighborjoining methods based on 1,000 replicates. bootstrap support values greater than 90 are shown living in the same household with dogs or living close to pig farms could give rise to type ii fcov. based on the analysis of the four full genomes of type ii fcov available at present (ntu156, 79-1146, 79-1683, and df-2), type ii fcovs appear to retain type i fcov sequences in their 5 0 and 3 0 ends. we asked whether the genes located in these regions are indispensable for fcov replication in cats. to answer that question, the amino acid sequence of genes retained at both ends of fcovs, ccovs, and tgevs, i.e., nsp 1 through nsp 11 and the n and orf7 genes, were further aligned and compared (table 3 ). in contrast to the greater than 90 % of amino acid homology between different strains of fcovs, the nsp 2, 3, and 6, as well as the n gene and the orf7a gene of fcovs, when compared with ccovs or tgevs, exhibited similarity of less than 80 %. this finding indicates that these gene products might possess irreplaceable functions for fcov replication. this might explain why type ii fcovs found in nature harbor genomes that evolved from a double recombination. although the prevalence of type ii fcov is consistently lower (2-11 %) than type i virus around the world (88-98 %) [9] [10] [11] [12] [13] [14] [15] [16] , our previous study indicates that infection of type ii fcov correlates significantly with fip a similarity plot was constructed to identify the sequence homology between type i fcovs black, c1je, and uu2 (gray); ccov ntu336 (red); and tgev purdue, m6, and ts (blue). red arrows represent putative recombination regions. a similarity of 1.0 indicates regions that share 100 % nucleotide identity. the similarity calculation was performed using the following parameters: a window size of 1,000 bp and a step size of 200 bp for full-length sequences when compared to type i [10] . as shown in the present study, type ii fcov arises by exchanging a large genome fragment (approximately 12 kb) of type i fcov with other members of alphacoronaviruses. the genes exchanged through this double recombination include nsp 13-16, structure protein s (spike), and accessory protein 3abc. the nsp 13-16 proteins are replication proteins with functions such as helicase activity (nsp 13), nucleoside triphosphatase activity (nsp 13), rna 5 0 -triphosphatase activity (nsp 13), 3 0 -5 0 exoribonuclease activity (nsp 14), rna cap formation (nsp 14 and nsp 16) , and endonuclease activity (nsp 15) [31] . it has ben reported that the function of the 3c protein might be crucial for viral replication in the gut but is dispensable for systemic fcov replication [32] . however, s proteins play a crucial role in receptor binding and eliciting protective immunity [33] . through the replacement of nearly one-third of the genome, the new virus might gain new antigenicity, posing a threat to cats that either have been infected with a type i virus before or never have been infected with fcov. fields virology key: cord-328518-umvk59dc authors: lee, dana n.; angiel, meagan title: two novel adenoviruses found in cave myotis bats (myotis velifer) in oklahoma date: 2019-12-03 journal: virus genes doi: 10.1007/s11262-019-01719-2 sha: doc_id: 328518 cord_uid: umvk59dc bats are carriers of potentially zoonotic viruses, therefore it is crucial to identify viruses currently found in bats to better understand how they are maintained in bat populations and evaluate risks for transmission to other species. adenoviruses have been previously detected in bats throughout the world, but sampling is still limited. in this study, 30 pooled-guano samples were collected from a cave roost of myotis velifer in oklahoma. a portion of the dna polymerase gene from adenoviridae was amplified successfully in 18 m. velifer samples; however, dna sequence was obtained from only 6 of these m. velifer samples. one was collected in october 2016, one in march 2017, and 4 in july 2017. the october and march samples contained viral dna that was 3.1% different from each other but 33% different than the novel viral sequence found in the july 2017 samples. phylogenetic analysis of these fragments confirmed our isolates were from the genus mastadenovirus and had genetic diversity ranging from 20 to 50% when compared to other bat adenoviruses. bats make up 20% of all mammals, and they are the second richest mammalian order in respect to number of species [1, 2] . in recent years, bats have emerged as a rich source of novel viruses [3, 4] . they have been found to host more zoonotic viruses per species than rodents [5] , and even documented to harbor viruses from two different viral families simultaneously [6] . viruses in bats can switch hosts to other bat species [4] and they are known to carry pathogenic viruses that can infect humans such as rabies, lyssaviruses, nipah and hendra viruses, ebola, and sars coronavirus [7, 8] . however, in most cases bats serve as reservoirs for viruses with immunological tolerance and without transmission to other humans [7, 9] . consequently, it is important to first identify viruses housed in bats in order to better understand the ecology of bat-borne viruses, how they are maintained in bat populations, and then evaluate risks for host transmission to other species. adenoviruses (advs) are double stranded dna viruses found in vertebrate hosts of many different species [8, 10] . the family adenoviridae consists of five genera [11] with members in the genus mastadenovirus infecting mammals [12] . advs are widespread in the human population and cause a variety of usually minor symptoms, such as respiratory illnesses, conjunctivitis, and gastroenteritis [8] . generally, these viruses are host-specific [13] and thought to have low zoonotic risk [14] ; however, chen et al. [15] discovered a novel adenovirus (tmadv) with the ability to infect both monkeys and humans. since bats are known reservoirs of numerous viruses and cross-species transmission has been documented for an adv, it will be useful to know which advs bats carry. adv strains have been found in more than 45 species of bats across their global distribution [6, 8, 12, [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] with seven species proposed by the international committee on taxonomy of viruses [35] . these studies represent a start at investigating bat advs, but there is a need for additional studies considering there are over 1300 species of bats [2] and few north american bats have been investigated. in this study, myotis velifer guano samples were tested for the presence of advs. we expected to find advs in m. velifer because that genus had the most advs in a study on 19 bat species in china [8] . guano was collected from m. velifer individuals in washita bat cave (washita co, ok). samples were either collected from a plastic tarp left laying overnight at the entrance of the cave (in march or july) or after bats were captured and placed in a sterile cup for 1 h (in october). bats were handled following guidelines from sikes et al. [36] , and white nose syndrome decontamination protocols were followed [37] . regardless of method of collection, four guano pellets were stored collectively in 500 µl of rna later ® , and stored at − 20 °c. we obtained a total of 120 guano pellets and this provided 30 pooled samples for analysis. dna extraction was carried out with qiaamp ® dna mini kit (qiagen) following the manufacture's protocol with minor modifications. nested pcr of the partial adenoviridae dna polymerase gene was carried out on each sample following li et al. [8] using primers pol-f (5′ cagcck-ckgtt rtg yag ggt 3′) and pol-r (5′ gchacc aty agc tcc aactc 3′). cycling profile consisted of 94 °c for 5 min, 30 cycles of 94 °c for 30 s, 48 °c for 30 s, 72 °c for 30 s, and then a final extension of 72 °c for 5 min. the second round of amplification used 1 µl of first round pcr product as template, primers pol-nf (5′ ggg ctc rtt rgt cca gca 3′ and pol-nr (5′ tay gac atc tgy ggc atg ta 3′) and the same cycling steps. positive (human adenovirus d dna from american type culture collection) and negative controls were used for each pcr. positive pcr products were purified with wizard ® sv gel and pcr clean-up system (promega). species verification of bats with positive adv samples was performed using nested pcr with primers sff_145f (5′ gthachgcy cay gchtty gta ataat 3′) and sff_351r (ctc cwg crtgdgcw agr tttcc 3′) from [38] and thermocycler steps consisting of 95 °c for 5 min, 38 cycles of 95 °c for 60 s, 60 °c for 30 s, 72 °c for 30 s, and final extension of 72 °c for 10 min to amplify a region of the cytochrome c oxidase gene that is highly diagnostic among bats. sanger sequencing of positive samples for bat and adv identification was performed by oklahoma medical research foundation, and fragments were aligned and manually edited in geneious v. 10.1.3 [39] . adv sequences (131) isolated from other bat species, turkey, canine, bovine, and human advs a, b, c, and d on genbank were added to the final alignment. model of sequence evolution, maximum likelihood analysis, and uncorrected p nucleotide distance were performed in mega v. 7.0.26 using all sites, including gaps, and 1000 bootstrap replicates [40] . there was at least one positive sample for each collection date, but only 6 of 18 positive samples had viral dna quantities necessary for successful sequencing. the alignment of our dna sequences with only the recognized viral species followed a hasegawa-kishino-yano model of evolution with a gamma distribution of 0.4847. the maximum likelihood analysis indicated that the advs were of the mastadenovirus genus and our proposed advs form separate clusters in distinct clades (fig. 1) . when all sequences from genbank were included in the alignment the model of nucleotide evolution was the general time reversible model with a gamma distribution of 0.6376 and invariant sites. in this analysis (not shown), the adv sequences did not form clusters according to their host family. this suggests transmission between host species is more common than coevolution with the host. myotis velifer samples from october 2016 (guano 61) and march 2017 (guano 2) were only 3.1% different from each other, while they were ~ 33% different from the 4 sequences extracted from july 2017 samples ( guano 21, 22, 24, 25) . guano 21 was identical to guano 22 and 24, and it was only 1 nucleotide different than guano 25. dna sequences from guano 2, 61, and 21 have been deposited in genbank (accession mn240005-mn240007). we do recognize the sequenced fragment is short (241 basepairs) and only provides preliminary viral classification. advs species are designated if amino acid sequence is > 5% for the dna polymerase gene. based on this criterion, we suggest guano 61 and guano 2 are different strains of the same adv species and are further referred to as cave myotis adv1-1 and cave myotis adv1-2. dna sequence from guano sample 21, 22, 24, and 25 represent a separate adenovirus species and are further referred to as cave myotis adv2. there were 2 non-synonymous mutations and 2 synonymous mutations between cave myotis adv1-1 and adv1-2. cave myotis adv1-1 and adv1-2 are most similar to gu226951 isolated from myotis horsfieldii [8] with genetic differences of 21.7% and 23.3%, respectively, and most different from hq529709 isolated from rousettus leschenaultii [21] with genetic differences of 48.45 and 49.2%, respectively. cave myotis adv2 is most similar to mf404977, 80-82, 84-87 isolated from pipistrellus pygmaeus [32] with a genetic difference of 20.2%. cave myotis adv2 is most different from kc692424, 28 isolated from pteropus giganteus [24] and hq529709 isolated from r. leschenaultii [21] by 44.5%. these two new advs are ≥ 27% different than any currently recognized bat mastadenovirus a-g (table 1) . when adv sequences were compared to other advs from bats, genetic diversity ranged from 20 to 50%. this study demonstrates that there is great genetic diversity of dna viruses within the same species of bats found in the same location, which is relatively uncommon for other vertebrate viruses [8] . we sampled caves during 3 seasons and found greater prevalence of viral dna in m. velifer guano during summer (july; 14/16 samples = 87%) than spring (march; 1/4 samples = 25%) or autumn (october; 3/10 samples = 30%). this is the highest percentage of positive adv samples detected in bats to date from a single sampling period. drexler et al. [20] collected guano samples from m. myotis in germany during may, june, and july for 3 years, and the highest percentage of positive samples for 1 sample date (67.5%, 27/40 samples) was collected in may and the same frequency in july. it is likely our high percentage of positive samples from one sample date is because m. velifer give birth to their young in may-june [41] . in summer there are many young bats present with weaker immune systems and a greater risk of lactating females sharing viruses with their young. drexler et al. [20] found a significant increase in prevalence of coronaviruses one month after parturition during summer months but adv detection was not significantly higher in any particular month within in the summer. little work has been done to investigate advs in north american species of bats; however, these studies [19, 29] and ours highlight the importance of identifying viruses housed in bats to better understand viral evolution, how viruses are maintained in bat colonies and evaluate risks for host transmission to other species. li et al. [8] found their novel bat adv (btadv-tjm) was capable of infecting several mammalian cells from different species, including humans, which indicates that bat advs possibly have a wide host range. they also suggest some bat advs have similar amino acid sequences for structural proteins to those in human advs and a high gc content, which suggest bat advs might be an ideal vector for gene therapy and vaccine delivery in humans [8] . future studies should include sequencing the entire viral genomes and isolating the viruses to test possible transfections in other species to better characterize the viruses discovered here. mammal species of the world: a taxonomic and geographic reference how many species of mammals are there? optimizing viral discovery in bats a comparative analysis of viral richness and viral sharing in cave-roosting bats a comparison of bats and rodents as reservoirs of zoonotic viruses: are bats special? short report: molecular detection of adenoviruses, rhabdoviruses, and paramyxoviruses in bats from kenya bats: important reservoir hosts of emerging viruses host range, prevalence, and genetic diversity of adenoviruses in bats bats are 'special" reservoirs for emerging zoonotic pathogens genome analysis of bat adenovirus 2: indications of interspecies transmission family adenoviridae novel bat adenoviruses with low g+c content shed new light on the evolution of adenoviruses molecular evolution of adenoviruses do nonhuman primate or bat adenoviruses pose a risk for human health? cross-species transmission of a novel adenovirus associated with a fulminant pneumonia outbreak in a new world monkey colony isolation of novel adenovirus from fruit bat new adenovirus in bats novel adenoviruses and herpesviruses detected in bats bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses amplification of emerging viruses in a bat colony isolation of a novel adenovirus from rousettus leschenaultia bats from india detection of adenoviruses in the northern hungarian bat fauna genetic diversity of adenoviruses in bats of china a strategy to estimate unknown viral diversity in mammals metagenomic study of the viruses of african straw-coloured fruit bats: detection of a chiropteran poxvirus and isolation of a novel adenovirus first detection of adenovirus in the vampire bat (desmodus rotundus) in brazil random sampling of the central european bat fauna reveals the existence of numerous hitherto unknown adenoviruses novel bat adenoviruses with an extremely large e3 gene evolution and cryo-electron microscopy capsid structure of a north american bat adenovirus and its relationship to other mastadenoviruses novel coronaviruses, astroviruses, adenoviruses and circoviruses in insectivorous bats from northern china a metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in neoromicia bats within south africa new adenovirus groups in western palearctic bats molecular detection of viruses in kenyan bats and discovery of novel astroviruses, caliciviruses, and rotaviruses detection of diverse viruses in alimentary specimens of bats in macau surveillance for adenoviruses in bats in italy guidelines of the american society of mammalogists for the use of wild mammals in research national whitenose syndrome decontamination protocol. https ://s3.amazo naws. com/org.white noses yndro me.asset s/prod/7a93c c80-b785-11e8-87bb-31745 2edc9 88-natio nal_wns_decon _updat e_09132 018 species from feces: order-wide identification of chiroptera from guano and other non-invasive genetic samples geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data molecular evolutionary genetics analysis version 7.0 bats of texas acknowledgements we thank jason shaw, bill caire, linda loucks publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. key: cord-012091-3bo88tux authors: ibrahim, madiha salah; watanabe, yohei; ellakany, h. f.; yamagishi, aki; sapsutthipas, sompong; toyoda, tetsuya; abd el-hamied, h. s.; ikuta, kazuyoshi title: host-specific genetic variation of highly pathogenic avian influenza viruses (h5n1) date: 2011-02-17 journal: virus genes doi: 10.1007/s11262-011-0583-y sha: doc_id: 12091 cord_uid: 3bo88tux the complete genome sequences of two isolates a/chicken/egypt/cl6/07 (cl6/07) and a/duck/egypt/d2br10/07 (d2br10/07) of highly pathogenic avian influenza virus (hpai) h5n1 isolated at the beginning of 2007 outbreak in egypt were determined and compared with all egyptian hpai h5n1 sequences available in the genbank. sequence analysis utilizing the rna from the original tissue homogenate showed amino acid substitutions in seven of the viral segments in both samples. interestingly, these changes were different between the cl6/07 and d2br10/07 when compared to other egyptian isolates. moreover, phylogenetic analysis showed independent sub-clustering of the two viruses within the egyptian sequences signifying a possible differential adaptation in the two hosts. further, pre-amplification analysis of h5n1 might be necessary for accurate data interpretation and identification of distinct factor(s) influencing the evolution of the virus in different poultry species. during november and december of preceding year and january and february of succeeding year and decline towards warmer weather. human infections are mostly linked to the peaks of the cold-season avian outbreaks. this was repeatedly detected in the following years up to 2009 and further 2010 posing eradication challenges because of such long-term endemicity. in this report, we analyzed our samples that were collected between 2006 peak and the beginning of its decline in 2007. utilizing the h5n1 virus directly from the original tissue without previous propagation showed molecular differences in the virus unique for each species and further different from those reported for other seasons' viruses in seven viral segments. in this article, we directly sequenced and analyzed the complete genome sequences of h5n1 virus from the tissues of two host species; chicken and duck. tissue samples were collected separately from individual dead birds of 10 birds total for each species. the samples were collected in january and march 2007 from house reared ducks (damanhour, el behiera governorate; n = 30) and large scale breeder chicken farm (alexandria governorate; n = 10,000), respectively. all of the ducks and chicken showed severe clinical signs of classical hpai h5n1 [2, 3] with high mortality rates. in order to minimize any possible variations due to laboratory passage either in embryonating chicken eggs (eces) or mdck cell line, total rna was directly extracted from the original clinical materials from chicken and duck samples using trizol reagent (invitrogen, japan) according to the manufacturer's instructions. rt-pcr amplification of the entire genome was performed using sets of specific primers [11] . the pcr products were separated on 1.5% agarose gels, and the fragments of interest were isolated from the gel using a qiaquick gel extraction kit (qiagen, japan). the purified full-length dna fragments were cloned into the mighty ta cloning kit (takara, japan) according to the manufacturer's instructions. for individual segments, ten colony-purified plasmids were sequenced by capillary electrophoresis using the applied biosystems genetic analyzer 3130 (applied biosystems, usa). sequence data were assembled using genetyx (software development co, ltd, tokyo, japan) and bioedit [10] . the genbank/embl/ddbj accession numbers for the sequences reported in this article are ab465592-ab465595, ab465620-ab465629, ab468063, and ab468064. sequence analysis showed [99% identity within each species-derived sequences, so representative sequences from chicken-and duck-derived viruses, cl6/07 and d2br10/07, respectively, were selected for further analysis. analysis of the ha genes showed a high percent of identity ([98%) with other sequences from egypt, nigeria, and the middle east. several nucleotide changes (table 1) were detected between cl6/07 and d2br10/07 as well as with the genbank available reference sequences from egypt. these changes were reflected on the amino acid sequences with three substitutions in cl6/07 and two in d2br10/07. further, the two amino acid substitutions in the d2br10/07 were different from those in cl6/07 and the reference sequences for 2006; asn154 in ha1 and ser207 in ha2 (h5 numbering) but same as the a/bar-headed-goose/ qinghai/65/05 (genbank accession no. dq095622). for 2007 and 2008, the cl6/07 and d2br10/07-specific amino acids were detected in genbank available sequences but they could not be species-correlated due to the low number of duck reference sequences, except the lys140 (h5 table 1 ). the highly pathogenic characteristic sequence of multiple basic amino acids at the cleavage site, gerrrkkrrg, was similarly detected in the ha of cl6/ 07 and d2br10/07 viruses. further, the 2-3-neuacgal avian receptor binding preference was maintained in both viruses, expressing the gln226 and gly228 (h3 numbering) [9, 15] . in addition, the lys216 in the ha receptor binding site detected in all clade 2.2 viruses [1] was also detected in our viruses. the alignment of the na sequences added to the differences between the cl6/07 and d2br10/07 (table 1) , where the cl6/07 had one amino acid substitution while the d2br10/07 had two, even though the former had nine nucleotide mutations versus four in the latter. unlike the ha, such amino acid changes were not detected in any of the genbank available sequences for egypt from 2006 to 2010. moreover, the 20 amino acid genotype z dominant deletion in the stalk of the na protein [14] , from residue 49 to 68 resulting in the loss of an n-linked glycosylation site upstream the deletion, was equally detected in cl6/07 and d2br10/07 viruses. phylogenetic analysis was performed using the mega4 software [19] employing the neighbor-joining method on the basis of full nucleotide sequences for the whole genome. estimates of phylogenies were calculated by performing 1000 bootstrap replicates. phylogenetic analysis of the ha gene ( fig. 1 ) and na (data not shown) showed that both isolates belong to clade 2.2 [20] together with other egyptian isolates as well as the isolates from nigeria and middle east. although, cl6/07 and d2br10/07 subclustered far from each other, they were close to the 2006 and 2007 derived egyptian sequences indicating that they have originated from endemic viruses circulating the same year with the duck viruses closer to their ancestors and sub-clustering independently. further, the cl6/07 subclustered together with a/chicken/egypt/c3br11/2007 (genbank accession no. ab551132), which was directly sequenced from the clinical materials without prior amplification. the 2009-and 2010-derived sequences were quite far from our 2007 sequences indicating that the virus is under continuous genetic evolution in the country. pairwise sequence comparison further revealed several nucleotide changes in the np, m, and ns genes (table 1) . further, the amino acid substitutions in the transmembrane region of m2 protein that are known as a key point for drug resistance [18] were not detected in cl6/07 or d2br10/07. in addition, there was no amino acid changes associated with the amantadine or rimantadine resistance, and all the amino acids were avian-specific except for the val33ile human signature in the np of both viruses [5, 6, 7] . in addition, both isolates contained the glu92 mutation in the ns1 protein, which is a major contributor to virulence of h5n1 viruses [12] . furthermore, the phylogenetic analysis confirmed the separate sub-clustering of the cl6/07 and d2br10/07, except for the ns gene (data not shown). further sequence comparison of the genes encoding the polymerase complex, pb2, pb1, and pa, revealed a number of differences between cl6/07 and d2br10/07, which were further different from those of the genbank egyptian sequences (table 1 ). in contrast to cl6/07, the d2br10/07-derived sequences shared several nucleotides and the leu82ser in the pb1-f2 with the two human isolates from egypt (tables 1). the glu627lys mutation in pb2, which is characteristic of human viruses and of increased pathogenecity and host range as well [5, 17] was equally detected in both viruses. moreover, phylogenetic analysis of the polymerase genes also showed separate subclustering (data not shown). in addition, the polymerase complex appeared to be very distinct from the two nigerian lineages so and ba [8] , but with closer relation to the middle east isolates (data not shown). in our analysis, comparative genetic characterization of the eight rna segments of cl6/07 and d2br10/07 together with 2006-2010 genbank egyptian reference sequences showed that both viruses had nucleotide as well as amino acid differences that appeared to be specific for each, except for the m gene that was highly conserved. the na gene appeared to be uniquely maintained where the cl6/07 and d2br10/07-specific mutations were not detected in 2006 till 2010 egyptian reference sequences. moreover, 2007 was linked to more human infections (25 cases) compared to 2006 (18 cases) and 2008 (8 cases) [21] indicating a possible host-dependent molecular adaptation and/or evolution of h5n1 in 2007 for which the host is not yet disclosed. an increase in human infections was detected in 2009 (39 cases) even though reassortmant or new virus entry has not been reported yet for egypt; however, this could be a consequence of the 2008 declared endemicity of the virus in the country. on the contrary, 25 cases were only reported for 2010 [21] reflecting a necessity to unravel the possible transfer host. even though, cl6/07 and d2br10/07 were derived from different cities, they clustered with other egyptian sequences indicating a single origin with a possible different molecular evolution. this further confirms that in contrast to nigeria, and as cattoli et al. [4] , egypt seems to have had a single entry of the virus, which appears to have happened in early 2006 or late 2005. the maintenance of two different viruses in two different species may increase the burden of threat to human, especially where direct contact with different avian species is common. however, d2br10/07 carried more amino acid mutations and shared several nucleotides in the polymerase genes with the human isolate; human/egypt/902782/06 as well as the human signature; leu82ser in pb1-f2 with the isolate; human/egypt/902786/06. this may point a possibility that duck could serve as a viral disseminating and/or amplifying host, being closer to the ancestors, and possibly the potential source for human infection. ducks have been shown to have a central role in the generation and maintenance of h5n1 viruses in china [14] , while for egypt, duck-derived sequences are so scarce compared to those of chicken-or human-derived ones even though ducks are extensively reared and consumed, mainly in the country side. considering the diversity of duck susceptibility to hpai h5n1, understanding the role of ducks in the emergence and maintenance of these viruses and its role in viral spread to other poultry species and human is required. the differences in the amino acids detected in our sequences relative to the genbank 2006-2010 available egyptian sequences could result from utilizing the original clinical materials directly for sequence analysis without prior amplification in vivo; ece or in vitro; mdck cells. recently, le et al. [13] showed that the pb2 gene population of h5n1 virus grown in ece or on mdck did not reflect that of the original. together, it seems that this was likely to occur in our analysis where mostly ece-derived viruses were used for the sequence analysis of genbank reference viruses. further, it may indicate that viral variants harbored inside the infected host may differ from those shed outside. moreover, such species-associated changes could have been independently selected during replication in individual birds and/or individual tissue (y. watanabe and m.s. ibrahim, unpublished results) reflecting a possible differential evolution within individual species. human infections are mainly linked to a previous contact with an unknown dead host. thus, accurate molecular analysis of h5n1 gene assemblage in different avian hosts, mainly ducks as well as human would improve the detection of the host-associated changes having the potential for viral spread within the human population and identifying the source of infection and the mysterious host behind that. furthermore, our findings highlight an essential need for using the original clinical material as a source for viral sequence analysis to accurately understand the molecular evolution of h5n1 in individual hosts and also to identify sequence changes that may facilitate cross species infection. fig. 1 phylogenetic tree of the hemagglutinin (ha) segments of cl6/07 and d2br10/07, and other genbank egyptian reference viruses. the neighbor-joining trees based on the full-length nucleotides were generated with mega4 with 1000 bootstrap value. bootstrap values over 50% are shown at the tree nodes. the chicken and duck sequences are indicated in bold and underlined. the arrow head points to a/chicken/egypt/c3br11/2007 that represents directly sequenced viral rna without prior amplification. trees are rooted to the a/chicken/egypt/r1/2006 that was isolated in december 2006. the scale bar represents the distance unit between sequence pairs b virus genes (2011) 42:363-368 367 avian influenza virus (h5n1) outbreaks, kuwait epidemiological findings of outbreaks of disease caused by highly pathogenic h5n1 avian influenza virus in poultry in egypt during characterization of an avian influenza virus h5n1 egyptian isolate highly pathogenic avian influenza virus subtype h5n1 in africa: a comprehensive phylogenetic analysis and molecular characterization of isolates genomic signature of human versus avian influenza a viruses properties and dissemination of h5n1 viruses isolated during an influenza outbreak in migratory waterfowl in western china establishment of multiple sub-lineages of h5n1 influenza virus in asia-implications for pandemic control molecular and antigenic evolution and geographical spread of h5n1 highly pathogenic avian influenza viruses in western x-ray structures of h5 avian and h9 swine influenza virus hemagglutinins bound to avian and human receptors analogs bioedit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/nt universal primer set for the full-length amplification of all influenza a viruses pathology, molecular biology, and pathogenesis of avian influenza a (h5n1) infection in humans selection of h5n1 influenza virus pb2 during replication in humans genesis of a highly pathogenic and potentially pandemic h5n1 influenza virus in eastern asia molecular characterization of the hemagglutinin and neuraminidase genes of h5n1 influenza a viruses isolated from poultry in vietnam from oie, update on highly pathogenic avian influenza in animals (type h5 and h7) a single amino acid in the pb2 gene of influenza a virus is a determinant of host range emergence of amantadine-resistant influenza a viruses: epidemiological study mega 4: molecular evolutionary genetic analysis (mega) software version 4.0 who, continuing progress towards a unified nomenclature system for the highly pathogenic h5n1 avian influenza viruses who, cumulative number of confirmed human cases of avian influenza a/(h5n1) reported to who acknowledgments this study was supported by the jsps postdoctoral fellowship for foreign researchers and the grant-in-aid (scientific research (b) (overseas academic research)), japanese society for the promotion of science, japan.open access this article is distributed under the terms of the creative commons attribution noncommercial license which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. key: cord-322475-i29t7ce8 authors: chen, xi; yang, jinxian; yu, fusong; ge, junqing; lin, tianlong; song, tieying title: molecular characterization and phylogenetic analysis of porcine epidemic diarrhea virus (pedv) samples from field cases in fujian, china date: 2012-07-29 journal: virus genes doi: 10.1007/s11262-012-0794-x sha: doc_id: 322475 cord_uid: i29t7ce8 the outbreak of porcine epidemic diarrhea virus (pedv) has been a big problem of swine industry in china in recent years. in this study, we investigated molecular diversity, phylogenetic relationships, and protein characterization of fujian field samples with other pedv reference strains. sequence analysis of the s1 and sm genes showed that each sample had unique characteristics, and the sample p55 may be differentiated from the others by the unique deletions and insertions of sm gene. phylogenetic analysis based on s1 or sm gene, which have high levels of variations, indicated that each sample was related to the specific reference strain, and this finding was consistent with the protein characterization prediction analysis. the study is useful to better understand the prevalence of pedv and its prevention and control in fujian. porcine epidemic diarrhea (ped) is a devastating swine disease that is characterized by acute enteritis and lethal watery diarrhea, followed by dehydration, and frequently leading to a high mortality in piglets [1] [2] [3] . most of the incidence farms found the disease first in farrowing barns and subsequently 100 % mortality of newborn piglets. the disease was first reported in england in 1971 [4] , and since then, outbreaks of the disease have been reported frequently in europe and asia [5] [6] [7] . since 1990s, the disease has continuous outbreak in pig farms of 26 major cities and provinces in china, causing tremendous economical losses to the swine industry [8] . the causative agent of ped, the porcine epidemic diarrhea virus (pedv), was first described in 1978 [9] . then, a cell culture system was developed for pedv isolation and propagation [10] . pedv is a member of coronavirus genus and the family coronaviridae. the genome consists of a positive-sense, single-stranded rna, with 27-32 kb in size, which can transcribe into several subgenomic mrnas, and encode structure or non-structure proteins in a conserved order [11] . the polymerase gene, which covering 70 % of the genome, encodes the replicase polyproteins. the genes for major structural proteins including the membrane protein (m), the phosphorylated nucleocapsid protein (n), the small membrane protein (sm), and the spike protein (s) are located downstream of the polymerase gene [11] . the s glycoprotein makes up the large surface projections of the virion and plays an important role in the attachment of viral particles to the receptor of the host cell [12] [13] [14] . thus, the s glycoprotein would be a primary target for the development of vaccines against pedv. it is also the major envelope glycoprotein of the virion, which serves as an important viral component to understand genetic relationships of different pedv strains and the epidemiological status of pedv in the field [6, 15, 16] . the sm gene is the only accessory gene of pedv. accessory genes are generally maintained and their loss mainly results in attenuation of the virus in the natural host [17] . for pedv, virulence of the virus can be reduced by altering the accessory gene region in a manner similar with tgev [18] , and its differentiation could be a marker of virus attenuation [19] and a valuable tool for the study of molecular epidemiology of pedv [8] . in china, pedv was first isolated in 1982 [20] , its prevalence has been a big problem of swine industry in recent years, although a periodic vaccination strategy has been applied nationwide to prevent the disease [21] . thereby, a comprehensive study is necessary to better understand the genetic relationships between different strains, and would be helpful to find out the reason of the continuously outbreak of pedv and develop new strategy to control and prevent pedv infection. in this study, we investigated the molecular epidemiology and analyze phylogenetic relationships of fujian pedv field samples with other pedv reference strains. the study mainly focused on s1 and sm gene due to their vital roles in viral function and higher variation. partial of intestine or stool specimens were taken individually from the acute enteritis and watery diarrhea piglets of 3 different big swine farms in fujian province in 2011, and designated as p55, p68, and f422, respectively. intestinal samples were homogenized with 9 times of phosphatebuffered saline (pbs). the suspensions were then vortexed and centrifuged for 10 min at 1,7009g. the supernatants were stored at -80°c before utilization. in order to determine the sequences of the pedv samples, primers were designed based on the sequence of reference pedv strains ( table 1) . partial of s gene, i.e., s1, was amplified for investigation because of its long length. in brief, viral rna was extracted from the supernatants of the homogenized samples with the rnaiso plus agent (takara, japan) according to the manufacturer's instructions. rt-pcr was conducted individually to amplify each fragment from the isolated rna using primescript ò one step rt-pcr kit ver.2 (takara, japan) according to the manufacturer's protocol under the following conditions: reverse transcription at 50°c for 30 min, denaturation at 94°c for 2 min, 30 cycles of denaturation at 94°c for 30 s, annealing at 55°c for 30 s, and extension at 72°c for 1 min. the rt-pcr products were analyzed by 1.5 % agarose gel electrophoresis and visualized by ultraviolet illumination after ethidium bromide staining. bands of the corresponding size of the gene were excised, and the synthesized dna was purified using a qiaquick gel extraction kit (qiagen, germany) according to the manufacturer's instructions, then sequenced by takaka company. the nucleotide and deduced amino acid sequences of s1 and sm genes of pedv samples were independently used for sequence alignments. the multiple-sequencing alignments were generated with clustalw method by megalign 4.0 [22] . phylogenetic tree were constructed with deduced amino acid sequences by the bootstrap neighbor-joining method. in the study, the characterizations of deduced amino acid sequences, including pi value, antigenic peptides, hydrophobic positions, and transmembrane motif, were analyzed by danman program. sequence analysis of s1 region the nucleotide sequences of the sl region are 2,024 bp for p55, 2,032 bp for p68, and 2,036 bp for f422 in length (accession number: jq723739, jq723740, and jq723741). sl protein of p55 is 620 aa in length with a predicted mr of 68.1 kda, sl protein of p68 and f422 is 522 aa in length with a predicted mr of 57.2 kda. twelve homolog sequences were found in the genbank and shared the similarity of 99 % (table 2) . however, mutations were frequently occurred in s1 gene. the alignment analysis indicated that five sequences including p68, f422, ch/ interestingly, most of the mutations were observed in the n-terminal region. these variations of p68 and f422 were probably due to mutation of the gene with filed strains. p55 and dr13 consists of another group (group 2, fig. 1 ) with 8 specific nucleotide changes, and the mutations occurred in the middle of s1 gene, interestingly, the purine (c/g) and pyrimidine (a/t) was found interchanged (c/g$a/t). the relationships of group 1 and group 2 were testified by their deduced amino acids. the sequences of group 1 were found to have a long deletion at the initial followed by a short deletion. the mutations of group 2 were found to have a deletion at position 157 and a substitution at position 329 (s?f). in terms of potential asparagine (n)linked glycosylation sites, only 11 sites were found in group 1, much less than group 2 (14 for p55 and 15 for dr13). unlike the result by lee et al. [23] , neither gtaaac nor similar sequence was found upstream of the initiator atg of the s gene in all of the chinese and english (cv777) strains. the sm gene of 3 fujian pedv field samples were sequenced (accession number: jq723734 for p55, (fig. 2) . p55 and f422 own 7 and 8 unique point mutations, respectively. however, besides the long deletion in p55, only one amino acid was changed by those mutations (f?l at 124 in f422, fig. 2 ). in addition, p55 have one less asparagine (n)-linked glycosylation sites than the others. all the pedv strains including the 3 fujian samples except the sm98 strain (accession number: gu937797) have a conserved sequence (ctagac) at 46 nucleotides upstream of the initiator atg. in order to analyze the phylogenetic relationships between the 3 fujian samples and other pedv strains isolated in various regions worldwide, we constructed 2 phylogenetic trees using the deduced amino acid sequences of s1 and sm, respectively (fig. 3) . the phylogeny based on the s1 glycoprotein indicated all the strains were clustered into 3 major groups, including one big mixed group (group 1) and 2 chinese groups (group 2 and 3). p68 and f422 formed a subgroup (subgroup 4) to differentiate with other strains. the subgroup comprising dr13 and p55 (subgroup 1) located in group 1. the result was correlated with the finding from sequence analysis. quite different from the results from s1 protein, phylogenetic analysis based on the sm protein fragment divided the strains into 2 groups, one of which included p55 and ch/gsjiii/07 (fig. 3b) . the reason might be the deletions occurred in the p55 and ch/gsjiii/07. f422 had a close relationship with dx and formed a subgroup, while p68 formed another subgroup. the characterization of s1 protein confirmed the results from phylogenetic analysis ( table 3 ). the characterizations of p55 and dr13, except antigenic peptide number, were shown to be greatly different from those of other strains; and the strains f422, p68, ch/fjnd-3/2011, cnu-091222-02, and cv777 shared the similar antigenic peptide, but had one for the sm protein, pi varied from about 6.5 to 11 among the 6 chosen strains (table 4) , indicating the potential variation of the protein. it was noteworthy that high identities between f422 and dx were indicated by same characterizations except one hydrophobic region. the identities between p68 and cv777 were less than dx and f422, differences of which involved in little pi variation, one variation in hydrophobic and transmembrane segments and 3 positions' amino acid mutations (table 4 , underlined). consistent with the phylogenetic analysis, the characterizations of p55 and ch/gsjiii/07 were similar and extremely different from the other strains. since the sm determines the virulence of pedv [24] , our results would benefit the research on the variation of virulence of pedv in china. the diversities in s1 and sm were observed to be significant among different strains. although there were so many mutations in this segment, the first unique characteristic was the deletion in the sm gens of ch/gsjiii/07 and p55. compared to ch/gsjiii/07, p55 was found to be more viable due to the existence of insertion within the c-terminus domain, the unique point mutations and less asparagine (n)linked glycosylation sites. the long deletion of sm gene, which was also found in the field strain dr13 (accession number: jq023161) and its attenuated strain (accession number: jq023162) [25] , led to reduced pathogenicity and induced protective immune response in pigs [24] . remarkably, similar results were found in p55 and there were no significant mutations found in the sequences of other structural protein genes including m, n (data not shown), and s gene, whether the mutated strain reduced its pathogenicity or not needs further study. the loss of sm resulted in attenuation of the virus in the natural host. however, we found that the pedv with long deletion of sm gene also caused typical clinical signs of pedv infection, the pathogenesis mechanism of the virus and how the sm mutant strain comes from also need to be clarified. in general, the variation in sm gene different from the various diversities of sm gene, the s1 region of the 3 fujian samples have unique mutations in common. coronaviruses have transcription regulatory sequences (trss) that include a highly conserved core sequence 5 0 -cuaaac-3 or a related sequence at upstream of encoding genes [26] . though the sequence ataaac, agaaac, and ctagac were found respectively upstream of the initiator of m, n (data not shown), and sm gene, the sequence gtaaac reported in the korean strains [23] was not found upstream of the s1 gene of the fujian pedv samples. however, the neutralizing epitome was conserved in s1 that is responsible for mediating the production of anti-viral neutralizing antibodies. phylogenetic trees based on the protein sequence were constructed to analyse the relationship between the fujian samples and the other strains. phylogenetic analysis based on sm protein indicated that the strain ch/gsjiii/07 was relatively close to p55, but distantly related with group 1. however, park et al [27] found that ch/gsjiii/07 was in group 1, which was different from our research. the reason for these might be due to the nucleotides sequences were used in the previous study, but amino acid sequences were used in this study. the location of p68 and f422 in the tree based on s1 protein suggested high variation of fujian samples. dr13 and p55 were within the same subgroup. as dr13 was used to develop the pedv vaccine in korea [28] , it might be interesting to know whether p55 can be used to develop the pedv vaccine in china. table 2 . a tree based on amino acid sequences of s1 protein. b tree based on amino acid sequences of sm protein the results of protein characterization prediction confirmed the relationship and demonstrated specific differences between the close strains obtained from sequence and phylogenetic analysis, which might be useful in further functional exploration. it was noteworthy that the unique hydrophobic region in the n-terminus of s1 protein of cv777, cnu-091222-02, dr13, and p55 that might related to the variation of protein structure and function. in conclusion, the fujian pedv samples were classified into different group. both of p68 and f422 were found to have close relationship with isolated strains from china, but still have some unique characterizations. the p55 had highest variation and a close phylogenetic relationship with filed strain ch/gsjiii/07. the underlines indicate amino acid mutations between strains experimental infection of pigs with a new porcine enteric coronavirus, cv777 porcine epidemic diarrhea, in diseases of swine porcine epidemic diarrhoea virus as a cause of persistent diarrhoea in a herd of breeding and finishing pigs letter to the editor an immunoelectron microscopic and immunofluorescent study on the antigenic relationship between the coronavirus-like agent, cv777, and several coronaviruses chinese-like strain of porcine epidemic diarrhea virus an outbreak of swine diarrhea of a new-type associated with coronavirus-like particles in japan molecular epidemiology of porcine epidemic diarrhea virus in china a new coronavirus-like particle associated with diarrhea in swine propagation of the virus of porcine epidemic diarrhea in cell culture the genome organization of the nidovirales: similarities and differences between arteri-, toro-, and coronaviruses the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex identification of the epitope region capable of inducing neutralizing antibodies against the porcine epidemic diarrhea virus major receptorbinding and neutralization determinants are located within the same domain of the transmissible gastroenteritis virus (coronavirus) spike protein sequence analysis of the partial spike glycoprotein gene of porcine epidemic diarrhea viruses isolated in korea coronaviruses: structure and genome expression the group-specific murine coronavirus genes are not essential, but their deletion, by reverse genetics, is attenuating in the natural host efficacy of a transmissible gastroenteritis coronavirus with an altered orf-3 gene cloning and further sequence analysis of the orf3 gene of wild-and attenuated-type porcine epidemic diarrhea viruses porcine epidemic diarrhea molecular characterization and phylogenetic analysis of membrane protein genes of porcine epidemic diarrhea virus isolates in china the clustal_x windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools heterogeneity in spike protein genes of porcine epidemic diarrhea viruses differentiation of a vero cell adapted porcine epidemic diarrhea virus from korean field strains by restriction fragment length polymorphism analysis of orf 3 complete genome sequences of a korean virulent porcine epidemic diarrhea virus and its attenuated counterpart complete genome sequence of transmissible gastroenteritis coronavirus pur46-mad clone and evolution of the purdue virus cluster molecular characterization and phylogenetic analysis of porcine epidemic diarrhea virus (pedv) field isolates in korea cloning and further sequence analysis of the spike gene of attenuated porcine epidemic diarrhea virus dr13 acknowledgments this work was supported by project management for agricultural science and technology achievements transformation fund (2010gb2c400209), science and technology major project of fujian (2010nz0002-3), and national spark program (2011ga720005). we also thank hualong feedstuffs technology and development group company in fujian province for sample collection and advices. key: cord-303672-ujp78213 authors: gu, wen-yuan; li, yan; liu, bao-jing; wang, jing; yuan, guang-fu; chen, shao-jie; zuo, yu-zhu; fan, jing-hui title: short hairpin rnas targeting m and n genes reduce replication of porcine deltacoronavirus in st cells date: 2019-08-28 journal: virus genes doi: 10.1007/s11262-019-01701-y sha: doc_id: 303672 cord_uid: ujp78213 porcine deltacoronavirus (pdcov) is a recently identified coronavirus that causes intestinal diseases in neonatal piglets with diarrhea, vomiting, dehydration, and post-infection mortality of 50–100%. currently, there are no effective treatments or vaccines available to control pdcov. to study the potential of rna interference (rnai) as a strategy against pdcov infection, two short hairpin rna (shrna)-expressing plasmids (pgenesil-m and pgenesil-n) that targeted the m and n genes of pdcov were constructed and transfected separately into swine testicular (st) cells, which were then infected with pdcov strain hb-bd. the potential of the plasmids to inhibit pdcov replication was evaluated by cytopathic effect, virus titers, and real-time quantitative rt-pcr assay. the cytopathogenicity assays demonstrated that pgenesil-m and pgenesil-n protected st cells against pathological changes with high specificity and efficacy. the 50% tissue culture infective dose showed that the pdcov titers in st cells treated with pgenesil-m and pgenesil-n were reduced 13.2and 32.4-fold, respectively. real-time quantitative rt-pcr also confirmed that the amount of viral rna in cell cultures pre-transfected with pgenesil-m and pgenesil-n was reduced by 45.8 and 56.1%, respectively. this is believed to be the first report to show that shrnas targeting the m and n genes of pdcov exert antiviral effects in vitro, which suggests that rnai is a promising new strategy against pdcov infection. porcine deltacoronavirus (pdcov) is a new member of the deltacoronavirus genus coronavirus that causes intestinal disease. clinical symptoms include vomiting, diarrhea, dehydration, and even death of piglets [1] [2] [3] . since the first report of pdcov in hong kong in 2012 [4] and the outbreak of pdcov in the usa in 2014 [5] , the novel porcine coronavirus has been detected in canada, south korea, thailand, mexico, and china [3, [6] [7] [8] . pdcov has become an important pathogen affecting healthy development in the pig industry. however, there are currently no vaccines or treatments that can effectively control pdcov [6] . pdcov is an enveloped, single-stranded, and positivesense rna virus. the entire genome contains about 25,400 nucleotides [4] . pdcov has four major structural proteins: spike (s), envelope (e), membrane (m), and nucleocapsid (n) [1, 2, 9] . the detailed function of each pdcov proteins is unknown. according to studies on other coronaviruses, m protein is the most abundant component of the viral envelope. it plays an important role in viral assembly process and budding [10, 11] . the m protein can also induce production of protective antibodies [10, 12] . the n protein of the coronavirus forms a helical nucleocapsid with genomic rna and protects the viral genome from external interference [13, 14] . rna interference (rnai) is a process that effectively silences or inhibits the expression of a gene of interest, which is achieved by double-stranded rna (dsrna), which edited by zhen f. fu. wen-yuan gu, yan li and bao-jing liu have contributed equally to this work. selectively inactivates the mrna of the target gene. rnai has been successfully used in infection inhibition studies of animal viruses, such as porcine epidemic diarrhea virus (pedv) [15] , influenza virus a [16] , porcine transmissible gastroenteritis virus (tgev) [17] , and porcine reproductive and respiratory syndrome virus [18] . however, whether rnai inhibits the replication of pdcov has not been reported. rnai interferes with viral replication through short hairpin rnas (shrnas) or small interfering rnas (sirnas). shrna is more stable, long-lasting, and more efficient than sirna. therefore, in this study, we constructed two shrnas (pgenesil-m and pgenesil-n) in a plasmid expression system that targeted the m and n genes of pdcov, and investigated the efficiency of shrna-mediated rnai of pdcov replication in vitro. st cells were cultured in dulbecco's modified eagle's medium containing 10% heat-inactivated fetal bovine serum (zhejiang tianhang biotechnology co. ltd., hangzhou, china) and 1% penicillin-streptomycin solution, and incubated in a 37 °c environment containing 5% carbon dioxide. the pdcov strain hb-bd (genbank no. mf948005) was propagated in st cells as previously described [1] . after 80% of the cells developed a cytopathic effect (cpe) of viral infection, the culture (cells plus medium) was collected and subjected to three freeze-thaw cycles to lyse the cells. the virus titer was determined by the 50% tissue culture infectious dose (tcid 50 ) as described previously [6] . the predicted and analyzed shrnas were obtained by predicting the m and n gene shrnas of pdcov using rnai target finder software (www.ambit ion.com/techl ib/misc/ sirna ). using the blast program, the candidate shrna sequences were aligned with the pig genome and other pdcov sequences submitted to genbank, and the specific strong shrnas were selected. to ensure a similar rnai effect on different pdcov strains, two theoretically effective sequences at nucleotide positions 328-346 (m) and 270-288 (n) were selected. dsdna sequences encoding the shrnas were synthesized, with 4-nt 5′ single-stranded overhangs complementary to bamhi and hindiii-cleaved dna at the ends. scrambled sirna sequences were designed as negative controls (nc). the sequences are shown in table 1 . each ds-shrna-coding sequence was ligated into the bamhi and hindiii restriction sites of the shrna expression vector pgenesil-1 (wuhan genesil biotechnology, china) and transformed into escherichia coli competent cells. the recombinant plasmids were named pgenesil-m, pgenesil-n, and pgenesil-nc. st cells were seeded in a 24-well plate at 2 × 10 5 per well in 500 μl medium without antibiotics in a conventional manner and cultured for 18-24 h at 37 °c in a 5% co 2 environment. when the degree of monolayer cell confluence reached 70-80%, transfection was started according to the manufacturer's instructions of lipogene™ 2000 plus transfection reagent (us everbright lnc., san francisco, us). briefly, 2 μg of transfection reagent was diluted with 50 μl of opti-mem and stand for 5 min at room temperature, then mixed with 50 μl diluted shrnaexpressing plasmid (1 μg shrna-expressing plasmid was diluted with 50 μl of opti-mem). after 20 min, the cells of each well were washed three times and overlaid with 100 μl of the transfection mixture. the cells were incubated for 4 h at 37 °c, and the medium was changed. after 24 h, the cells were infected with 100 tcid 50 pdcov. st cells infected with pdcov but had undergone the transfection procedure without plasmid dna added served as mock-transfected controls. expression vector pgenesil-1 encodes the enhanced green fluorescent protein (egfp). therefore, egfp expression in a cell line can be used as a in this study, we used inverted fluorescence microscopy to capture images to evaluate the cell transfection efficiency and cpe. pdcov cultures in shrna expression plasmid-transfected st cells were harvested 48 h after virus infection. after three repeated freeze-thaw cycles, the virus was diluted from 10 −1 to 10 −10 and added to a 96-well plate with eight wells per dilution of virus. cpe was observed and recorded daily, and viral titer was measured by tcid 50 using the reed-muench method as previously described [1, 6] . st cells transfected with shrna recombinant plasmids (pgenesil-m and pgenesil-n) and a scrambled shrna recombinant plasmid (pgenesil-nc) were observed under fluorescence microscopy (fig. 1) . normal st cells showed no fluorescence, while st cells transfected with recombinant plasmid showed fluorescence. the shrna recombinant plasmids were successfully transfected into st cells, and the transfection efficiency of the two recombinant plasmids and the scrambled shrna plasmid was similar. to analyze whether shrna can prevent st cells from exhibiting cpe due to pdcov infection, recombinant plasmids pgenesil-m and pgenesil-n were transfected into st cells seeded in triplicate in 24-well plates. the vector pgenesil-nc expressing the non-specific shrna was used as a negative control. at 24 h after transfection, cells were infected with pdcov at 100 tcid 50 , and cells were examined for cpe every day. cpe was observed 48 h after infection and photographed (fig. 2) . st cells infected with virus only or st cells transfected with a negative control plasmid (pgenesil-nc) became enlarged, round, dense granular cells, occurring individually or in clusters. we also observed signs of cell shrinkage and detachment from the monolayer. as for the cells transfected with plasmids pgenesil-n and pgenesil-m expressing specific shrnas, observation showed that the extent of cpe was reduced. to analyze the inhibition of pdcov replication by shrna, viral titers in st cells were calculated by the reed-muench method at 48 h after viral infection (fig. 3) . the results showed that the titers of pdcov were significantly different from the virus titers transfected with pgenesil-nc (p < 0.05), while the difference between pgenesil-nc and mock-transfected cells was not significant. pgenesil-n showed higher inhibition efficiency than that of pgenesil-m shrna. if rnai is successful, replication of pdcov is inhibited, and the amount of the corresponding m and n genes is less. we used the n gene as a standard to analyze the effect of shrna inhibition of pdcov replication. realtime quantitative rt-pcr analysis of n gene level was normalized to the corresponding β-actin in the same sample (fig. 4) . the relative amount of n gene in mock cells was regarded as 1.000, whereas the relative amounts of n gene in cells infected with pdcov after being transfected with pgenesil-m, pgenesil-n, and pgenesil-nc were 0.542, 0.439, and 1.079, respectively. analysis of these data revealed that the amount of viral rna in samples transfected with pgenesil-m and pgenesil-n was reduced by 45.8 and 56.1%, respectively, compared to the mock control. this suggests potent inhibition of pdcov replication triggered by sequence-specific shrnas in st cells. pdcov is a recently discovered porcine enteropathogenic coronavirus [8, [19] [20] [21] . since 2015, pdcov has emerged in many provinces, leading to significant economic losses in swine husbandry in china. however, there are presently no effective treatments or vaccines available to control pdcov [6] . so, there is an urgency to develop an effective method for treatment of pdcov. rnai is a gene silencing mechanism at the post-transcriptional level with high specificity and can inhibit gene expression with high efficiency. therefore, rnai has been considered as an effective strategy to protect against bacterial and viral pathogens [22, 23] . rnai is triggered by endogenous or exogenous 21-23 nt rna duplexes [17] , and shrna and sirna are two commonly used rna molecules to block gene expression [17, 24] . compared with sirna, the interference efficiency induced by shrna was more effective. recently, the interference efficiency of shrna in some coronaviruses has been studied. wang designed and constructed three recombinant plasmids targeting the m gene of porcine tgev. after transfection into pk15 cells, m gene expression was reduced by 13, 68, and 70% [17] . shen et al. [15] have constructed five shrna-expressing plasmids targeting the n, m, and s genes of pedv. pedv rna in vero cells pre-transfected with these plasmids was reduced by 22-94.5%. however, because the success rate of pdcov isolation was low [1, 6, 25] , there are no reports on the use of shrna to inhibit replication of pdcov, which belongs to the same virus family as pedv and tgev. in our previous study, pdcov strain hb-bd was successfully isolated and serially passaged in cell culture and characterized. in this study, we designed two shrnas based on the theoretically valid sequences of the m and n genes of pdcov with nucleotide positions of 328-346 (m) and 270-288 (n), and established three shrna recombinant expression plasmids to study whether shrna-mediated rnai inhibited pdcov replication in vitro. to guarantee a similar rnai effect on different pdcov strains, the two theoretically effective sequences were analyzed by blast to ensure that they did not have any similar sequences in the swine genome, but shared 100% similarity with the published sequences of different pdcov strains. both the shrnas inhibited pdcov replication in st cells. the interference properties were revealed by reductions in cpe formation, virus tcid 50 titers, and viral rna copy numbers in the infected cells. the cpe and tcid 50 assay of pgenesil-m-and pgenesil-n-transfected cells showed viral suppression 48 h after infection and the titers of pgenesil-m-and pgenesil-n-transfected cells were reduced 13.2-and 32.4fold, respectively. the real-time quantitative rt-pcr assay showed that the viral rna copy number was reduced by 45.8% in pgenesil-m-transfected cells and 56.1% in pgenesil-m-transfected cells. the inhibitory efficiency of pgenesil-n was higher than that of pgenesil-m. although the interference efficiency of shrnas against pdcov in our study was lower than that of the shrnas targeting other coronaviruses [15, 17] , these results indicate that rnai against pdcov mediated by shrnas can inhibit pdcov replication in vitro. a disadvantage of this transient transfection/expression system is that virus replication can be inhibited only in cells that are expressing the shrna, and cells not expressing the shrna can be infected. therefore, one explanation for the lower interference efficiency of shrnas against pdcov in this study compared to shrnas targeting other coronaviruses in other studies could be lower transfection efficiency of the shrna expression plasmids. in conclusion, our results indicate that both shrnas plasmids targeting the m (328-346) and n (210-288) genes of pdcov genome inhibit pdcov replication in st cells with high efficiency. therefore, the two nucleotide positions are two potential targets for the inhibition of pdcov replication by rnai in vitro. however, whether the two shrnas can inhibit pdcov replication in vivo, and whether shrnas targeting other nucleotide positions of the m and n genes or other genes also inhibit pdcov replication need further research. isolation and phylogenetic analysis of porcine deltacoronavirus from pigs with diarrhoea in hebei province china full-length genome sequence of porcine deltacoronavirus strain newly emerged porcine deltacoronavirus associated with diarrhoea in swine in china: identification, prevalence and full-length genome sequence analysis discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus detection and genetic characterization of deltacoronavirus in pigs isolation and characterization of porcine deltacoronavirus from pigs with diarrhea in the united states functional characterization and proteomic analysis of the nucleocapsid protein of porcine deltacoronavirus occurrence and sequence analysis of porcine deltacoronaviruses in southern china complete genome characterization of korean porcine deltacoronavirus strain kor/knu14-04 identification of a conserved linear b-cell epitope in the m protein of porcine epidemic diarrhea virus a conserved domain in the coronavirus membrane protein tail is important for virus assembly heterogeneity in membrane protein genes of porcine epidemic diarrhea viruses isolated in china modular organization of sars coronavirus nucleocapsid protein identification of a specific interaction between the coronavirus mouse hepatitis virus a59 nucleocapsid protein and packaging signal effective inhibition of porcine epidemic diarrhea virus by rna interference in vitro small interfering rna targeting m2 gene induces effective and long term inhibition of influenza a virus replication inhibition of porcine transmissible gastroenteritis virus infection in porcine kidney cells using short hairpin rnas targeting the membrane gene a transgenic marc-145 cell line of piggybac transposon-derived targeting shrna interference against porcine reproductive and respiratory syndrome virus characterization and evolution of porcine deltacoronavirus in the united states complete genome sequence of porcine deltacoronavirus isolated in thailand in 2015 porcine deltacoronavirus: overview of infection dynamics, diagnostic methods, prevalence and genetic evolution in-vitro inhibition of spring viremia of carp virus replication by rna interference targeting the rnadependent rna polymerase gene potential and development of inhaled rnai therapeutics for the treatment of pulmonary tuberculosis short hairpin rnas (shrnas) induce sequence-specific silencing in mammalian cells isolation, genomic characterization, and pathogenicity of a chinese porcine deltacoronavirus strain chn-hn-2014 key: cord-268627-nnx46nwf authors: ren, xiaofeng; li, pengchong title: development of reverse transcription loop-mediated isothermal amplification for rapid detection of porcine epidemic diarrhea virus date: 2011-02-01 journal: virus genes doi: 10.1007/s11262-011-0570-3 sha: doc_id: 268627 cord_uid: nnx46nwf in this study, a reverse transcription loop-mediated isothermal amplification (rt-lamp) was developed for detection of porcine epidemic diarrhea virus (pedv). six primers were designed to amplify the nucleocapsid (n) gene of pedv. the optimization, sensitivity, and specificity of the rt-lamp were investigated. the results showed that the optimal reaction condition for rt-lamp amplifying pedv n gene was achieved at 63°c for 50 min. the rt-lamp assay was more sensitive than gel-based rt-pcr and enzyme-linked immunosorbent assay. it was capable of detecting pedv from clinical samples and differentiating pedv from porcine transmissible gastroenteritis virus, porcine rotavirus, porcine pseudorabies virus, porcine reproductive and respiratory syndrome virus, and avian infectious bronchitis virus. porcine epidemic diarrhea (ped) is an infectious enteric disease characterized by acute enteritis and diarrhea in pigs, and the infection is more severe in neonates [1] . at present, ped has been a major concern in the swine industry, particularly, in the asia and europe, resulting in large economic losses [2] [3] [4] . the causative agent of ped is porcine epidemic diarrhea virus (pedv), an enveloped and single-stranded rna virus that belongs to the family coronaviridae [5] . coronavirus comprises three major viral structural proteins: spike (s, 180-220 kda), membrane (m, 27-32 kda), and nucleocapsid (n, 55-58 kda) proteins [5, 6] . the s protein is a major viral antigen, binds to a cellular receptor for virus attachment to enter target cells and mediates viral attachment to target cells [7] [8] [9] [10] [11] [12] [13] . the m protein is a trans-membrane protein [14] and it is involved in the assembly process of viral nucleocapsid and membrane [15, 16] . the n protein of coronaviruses is a phosphorylated protein that interacts with virus genomic rna, forming a helical ribonucleoprotein [17] . therefore, it plays important roles in viral genome transcription, core formation and virus assembly [18] . viral n protein is also conserved and can be used as a diagnostic target for detecting viral infection. loop-mediated isothermal amplification (lamp) is a recently developed dna amplification method [19] . lamp uses four to six primers that recognize six to eight regions of target dna, in conjunction with the enzyme bst polymerase, which has strand-displacement activity. the synchronization dna synthesis by these primers maintains the specificity of the method. interestingly, the amplification step can be performed under isothermal conditions, resulting in the synthesis of a large amount of dna. lamp proceeds when the forward inner primer (fip) anneals to the complementary region in the target dna and initiates the first-strand synthesis. next, the outer forward primer (f3) hybridists and displaces the first strand, forming a loop structure at one end. the resulting single-stranded dna serves as a template for backward inner primer (bip)-initiated dna synthesis and subsequent outer backward (b3)-primed strand-displacement dna synthesis. the formed dumbbell-shaped dna stem loop structure serves as a template for subsequent hybridization between one inner primer and the loop, initiating the displacement dna synthesis. the lamp method may form the original stem loop and a new stem loop that is twice as long as the original one. the final products are stem loop dnas with several inverted repeats of the target dna and cauliflower-like structures bearing multiple loops [19] . at present, the lamp approach has been applied for detecting of infectious pathogens. examples include h5n1 avian influenza virus [20] , hepatitis b virus [21] , foot-andmouth disease virus [22] , etc. in this study, the authors developed a reverse transcription (rt)-lamp using primers directed toward the n gene of pedv. the convenience, sensitivity, and specificity of the established rt-lamp indicate its advantages and utility in detecting pedv. pedv isolate hljby, porcine transmissible gastroenteritis virus (tgev), porcine pseudorabies virus (prv), porcine rotavirus (prv), porcine reproductive and respiratory syndrome virus (prrsv) and avian infectious bronchitis virus (ibv) are propagated in susceptible cells. based on the n gene sequence of pedv (genbank accession number: gu321197), a total of six primers targeting the n gene were designed using the primer explorer version 3 (http://primerexplorer.jp/lamp3.0.0/index.html). they include an outer pair (f3, b3), an inner pair (fip, bip), and a loop pair (f-loop, b-loop). a pair of primers (named ped1 and ped2) was used for rt-pcr amplifying the n gene. information regarding the primer names and sequences is shown in table 1 . pedv propagation and rna extraction pedv was propagated in african green monkey kidney (vero) cells according to reference with modification [23] . in brief, vero cells were cultured in dulbecco's modified eagle medium (dmem) supplementary with 10% newborn bovine serum (excell bio, china) in six-well plates at 37°c to allow the formation of cell monolayer. the cells were washed with pbs and infected with pedv (500 ll/ well) at an multiplicity of infection (moi) of 2 at 37°c for 1 h in the presence of edta-free trypsin at a final concentration of 40 lg/ml. dmem containing edta-free trypsin (40 lg/ml) was then added into the wells (2.5 ml/ well) and the culture was maintained at 37°c for 48-36 h. the titer of pedv was 10 6.25 tcid 50 /ml. the total rnas were extracted from the culture supernatants of pedv, tgev, prv, prrsv, and ibv using the rna extraction kit (keygen biotech, china) and the genomic dna of prv was extracted from virus-infected vero cell culture using the dna extraction kit (omega, norcross, usa) according to the manufacturer's instructions. the extracted rna was subjected to reverse transcription (rt) to synthesize the cdna using reverse primer ped2 and a cdna synthesis kit (haigene, china) according to the manufacturer's instructions. the reaction mixture contained rna template (2 lg sterile water was used as a negative control template. the amplified dna products from the rt-lamp were analyzed by separating 5 ll of rt-lamp reaction mixture in ethidium bromide-stained 2% agarose gel electrophoresis, where the positive reaction mixtures showed a characteristic ladder of multiple bands. the relative quantification of the dna was performed using the gel documentation system (uvitec, cambridge, uk) and determined with gel analyzer software (copyright 2010 by dr. istvan lazar) according to the manufacturer's instructions. the reaction result was also observed directly without staining because of the white precipitate of magnesium pyrophosphate or the green color produced by the intercalating dye picogreen ò (invitrogen, wisconsin, usa) in positive reactions. after the amplification was completed, 2 ll of coloring agent (69 loading buffer:gene finder = 9:1) was added to each test tube and mixed. the test tubes were then examined visually. to determine the optimal reaction temperature, the rt-lamp reaction mixtures were incubated at 62, 63, 64, or 65°c for 30 min. the optimal reaction time was determined by performing the rt-lamp at the optimal temperature for 10, 20, 30, 40, or 50 min. finally, the reaction was terminated by heat inactivation at 80°c for 5 min. the amplified dna products from the rt-lamp assays were visualized by agarose gel electrophoresis as above. the concentration of pedv rna was determined using an ultra-violet photometer (type 752, shanghai spectrum instrument company) according to the manufacturer's instructions. then the tenfold serial dilutions of the rna (48 lg/ml) were used as template for rt-lamp and a conventional rt-pcr. the rt-lamp was performed as above. the rt-pcr was performed using a rt-pcr kit (haigene, china). elisa was performed to compare its sensitivity with rt-lamp. in brief, purified pedv particles (0.1 lg/ll) were serially diluted in carbonate-bicarbonate buffer (15 mm na 2 co 3 , 35 mm nahco 3 , ph 9.6) and the viruses were coated into elisa plates (100 ll/well) at 4°c overnight. the next day, the plates were blocked with 5% non-fat dry milk in pbs-0.05% tween 20 (pbst) at 37°c for 2 h. subsequently, the wells were incubated with serially diluted anti-pedv polyclonal antibody (1:1000 dilution) or control serum from a non-immunized rabbit at 37°c for 1 h, after triple wash with pbst. the plates were incubated with horseradish peroxidase-conjugated goat anti-rabbit igg (boster, china 1:5000 dilution in pbst) at 37°c for 1 h. the wells were incubated with o-phenylenediamine dihydrochloride (opd) substrate for 5 min after complete washing with pbst. the od 492 value was examined using an elisa reader. the od 492 value of anti-pedv serum positive well (p)/the od 492 value of control serum well (n) [ 2 was regarded as positive. twenty clinical feces of piglets (approx. 3 weeks) with diarrhea symptom were collected from a pig farm in heilongjiang province in 2010. the samples were prepared as a 10% (w/v) suspension in pbs (ph 7.2) and centrifuged at 20009g at 4°c for 10 min. the supernatant was subjected to rna extraction with above-mentioned rna extraction kit. the resulting rna was used as a template for rt-pcr and rt-lamp according to above-mentioned protocols. at the same time, the equal supernatant was used as coating antigen in elisa as above. the detection limit of the three methods was compared. to analyze the specificity of the rt-lamp, pedv, tgev, prv, prv, prrsv, and ibv were used as templates and subjected to rt-lamp as above. using pedv rna and six primers targeting the pedv n gene, an rt-lamp was done at 65°c in a water bath for 1 h. the resulting amplified dna products showed a characteristic ladder of multiple bands, indicating that the final products were the mixtures of stem loop dnas with various stem lengths (fig. 1a) . in contrast, the negative control did not show the characteristic bands. the results of virus genes (2011) 42:229-235 231 the rt-lamp reaction were also determined directly by visual inspection. if the reaction product is positive, the gene finder dye inserts into the double-stranded dna after the reaction and the product becomes green; otherwise, the dye does not insert into the double-stranded dna and the reaction sample remains blue (fig. 1b) . the effect of reaction temperature and incubation time on the rt-lamp was investigated. as shown in fig. 2a , the dna products of the rt-lamp at different temperatures showed multiple of characteristic ladder bands; however, the intensity of dnas determined by gel analyzer software from the reactions at 63°c was stronger than that at other reaction temperatures, which was judged as the optimal temperature for rt-lamp amplifying pedv n gene. the rt-lamp was then performed at 63°c for different time points. the results indicated that the dna products showed the highest intensity when the reaction was performed for 50 min (fig. 2b) . therefore, the optimal reaction condition of the current rt-lamp for pedv was 63°c for 50 min. the sensitivity of the rt-lamp assay was first compared with the conventional rt-pcr amplifying the tenfold serial dilutions of rna templates of pedv. the detection limit of rt-pcr was 4.8 9 10 -2 lg/ml which equal to a virus titer of 10 2.25 tcid 50 /ml, while, the rt-lamp had a detection limit of 4.8 9 10 -6 (10 0.75 tcid 50 /ml) which was much higher than that of rt-pcr (fig. 3) . after applying the same concentration of pedv particles in rt-lamp and elisa, the minimal required virus template amount for the both assays was analyzed. the results showed that the detection limit of rt-lamp was 1 9 10 -4 lg. in contrast, elisa had a detection limit of 1 9 10 -3 lg (fig. 4) . table 2 ). the rt-lamp had a similar sensitivity with elisa and was somewhat sensitive than rt-pcr in detection of clinical samples. to analyze the utility of the rt-lamp, several related porcine viruses (i.e., tgev, prv, and prv) and an avian coronavirus, ibv were used as templates and included in the rt-lamp. the result indicated that no positive dna products of the rt-lamp assay were observed among these control viruses. when the pedv was used as template, the positive bands were amplified as expected (fig. 5) . the result demonstrated that the rt-lamp assay is specific and can be applied in discriminating elisa for distinguishing pedv from other viruses. there are ped epidemics in china, although inactivated vaccines are used in some regions in china. establishment of rapid, sensitive, and cost-effective diagnostic assays for detecting pedv is highly desirable. virus isolation has been a popular detection method; nevertheless, the virological diagnosis is somewhat difficult for detecting pedv, since it was not possible until 1988 to propagate porcine epidemic diarrhea virus in cell culture [23] . even now, the viral titer of pedv in cell culture is still low. other diagnostic methods for detecting pedv include immunohistochemistry, in situ hybridization, dot-blot hybridization, rt-pcr, and real-time rt-pcr [24] [25] [26] [27] [28] . these methods may require either high-precision instruments or complicated procedures. therefore, they are unsuitable for detecting pedv in fields and in less well-equipped laboratories. the rt-lamp method established in this study is a valuable alternative for detection of pedv, since the novel dna amplification technology owns numerous advantages such as simplicity, rapidity, and inexpensiveness. the isothermal conditions required for lamp can be provided with a conventional water bath or heat block. therefore, the current method can be applied less in well-equipped laboratories and fields for rapid detection of pedv. in general, the lamp can be carried out under isothermal conditions (60-65°c). in this study, the authors optimized the reaction conditions of the rt-lamp by performing the test at different temperatures and time points. subsequently, its sensitivity was compared with that of rt-pcr. the results showed that the rt-lamp specific for pedv was approx. 10,000 times sensitive than the rt-pcr. nevertheless, it is necessary to screen other optimal primers to further compare the sensitivity between rt-lamp and rt-pcr in the future. moreover, the sensitivity between the rt-lamp and conventional elisa was compared using the inactivated pedv as template. the former is more sensitive than the latter. two reports have pointed out that the detection limit of rt-pcr for pedv was 10 2.0 tcid 50 /ml [29, 30] . detection limit of a commercially available elisa kit (jinma, shanghai) used in china was 0.1 ng/ml, which was the same as the detection limit of the rt-lamp developed in this study. this result further confirmed the sensitivity of the rt-lamp for amplifying the n gene of pedv. nonetheless, when the authors used these methods to detect pedv from clinical samples, the sensitivity of rt-lamp was somewhat higher than rt-pcr and had a similar sensitivity with elisa. more experiments are needed to clarify this point in the future; however, the rt-lamp still has advantages including simplicity, rapidity, and convenience. to analyze the specificity of the rt-lamp for pedv, several related or unrelated viruses were used as control templates. for example, pedv and tgev belong to the group i coronaviruses which are closely related [31] . ibv and prrsv belong to the group iii coronavirus and arterivirus, respectively; however, both viruses belong to the order nidovirales [32, 33] . the structural similarity between the n proteins of ibv and prrsv suggests that members of the coronaviridae and arteriviridae families share a mechanism of filamentous nucleocapsid formation, with suitable alterations necessary to interact specifically with their respective genomes [34, 35] . prv and prv are members of the families herpesviridae and reoviridae, respectively. these viruses such as pedv, tgev, prv, prv, or prrsv may cause co-infection in pigs. the results showed that the rt-lamp is successful only if the pedv served as template, indicating that the established method is specific and applicable for differentiation diagnosis. to the knowledge, this is the first report regarding the establishment and optimization of a rt-lamp for pedv n gene. the assay may be useful for the clinical diagnosis of pedv infection. proceedings of the international pig veterinary society congress veterinary virology the coronaviridae acknowledgments the authors acknowledge funding supported by program for new century excellent talents in heilongjiang provincial university (1155-ncet-005). key: cord-346643-os2kyvvf authors: wang, li; dai, xianjin; song, han; yuan, peng; yang, zhou; dong, wei; song, zhenhui title: inhibition of porcine transmissible gastroenteritis virus infection in porcine kidney cells using short hairpin rnas targeting the membrane gene date: 2016-11-15 journal: virus genes doi: 10.1007/s11262-016-1409-8 sha: doc_id: 346643 cord_uid: os2kyvvf the membrane (m) protein is the most abundant component of the porcine transmissible gastroenteritis virus (tgev) particle. to exploit the possibility of using rna interference (rnai) as a strategy against tgev infection, three plasmids (prnat-1, prnat-2, and prnat-3) expressing short hairpin rnas were designed to target three different coding regions of the m gene of tgev. the plasmids were constructed and transiently transfected into a porcine kidney cells, pk-15, to determine whether these constructs inhibited tgev production. the analysis of cytopathic effects demonstrated that prnat-2 and prnat-3 could protect pk-15 cells against pathological changes specifically and efficiently. additionally, indirect immunofluorescence and 50% tissue culture infectious dose (tcid(50)) assays showed that prnat-2 and prnat-3 inhibited the multiplication of the virus at the protein level effectively. quantitative real-time pcr further confirmed that the amounts of viral rnas in cell cultures pre-transfected with the three plasmids were reduced by 13, 68, and 70%, respectively. this is the first report showing that rnai targeting of the m gene. our results could promote studies of the specific function of viral genes associated with tgev infection and might provide a theoretical basis for potential therapeutic applications. transmissible gastroenteritis coronavirus (tgev) is a positive rna virus, which is a member of a large family of enveloped viruses [1] . pigs of any age and breed can be infected. in particular, sucking piglets at about 2 weeks old are the most susceptible, showing mortality rates up to 100%, which results in large economic loss in swine-producing areas worldwide [2, 3] . however, the pathogenic mechanism of tgev remains unclear [4] . at present, several vaccines to prevent tge are available; however, their efficacies are variable. attenuated tgev vaccines have the risk of returning to the virulent form and might even induce an adverse reaction and inactivated viruses are not sufficiently protective in pigs [5, 6] . moreover, newborn piglets can suffer from gastroenteritis within 20 h post-infection, and death can occur in 1-4 days [7] , whereas current vaccines cannot provide complete protection in the first 7 days after inoculation. thus, it is necessary to develop novel, highly effective, and rapid-acting antivirals to resist tgev infection [8] . rna interference (rnai) is a precise gene silencing method that uses double-stranded rna (dsrna) molecules comprising 19-27 nucleotides (nt) . rnai in the form of small interfering rnas (sirnas) or short hairpin rnas (shrnas) has been studied for their interference with virus replication [9, 10] . recent research suggests that the replication of various viruses, including many coronaviruses, could be inhibited effectively in vitro and in vivo [11] [12] [13] [14] [15] [16] . therefore, it might be possible to disrupt the replication of tgev in cell culture using shrnas targeting the m gene of tgev. tgev is a positive-sense, ssrna virus with a 28.5 kb genome that contains a leader sequence at the 5 0 end and a poly (a) tail at the 3 0 end, which encodes four structural proteins [spike (s), membrane (m), nucleocapsid (n), and envelope (e)] and five non-structural proteins [17] [18] [19] . the s protein is a major membrane glycoprotein that plays important roles in inducing a protective immune response, and in virus attachment, membrane fusion, and viral pathogenicity [20] [21] [22] . the n protein, together with the genomic rna, forms the viral nucleocapsid [22] . the e protein regulates virion assembly and release [23] . the m protein is the most abundant component of the coronavirus particle [24] and differs from other viral proteins in terms of its structure, processing, and intracellular transport [25] . the expressions of the m and e proteins might be sufficient to trigger the formation of virus-like particles (vlps). in addition, m is highly conserved among different strains, and our previous studies proved that the expression the m protein alone using a baculovirus expression system could lead to the formation of vlps, as observed under a transmission electron microscope, which further confirmed that the m protein of tgev is a decisive protein for the proliferation of viral proteins [26] . as one of the important structural proteins of tgev particles, the m protein is exposed on the viral internal core [27] , and associates with the golgi complex in the cell, which suggests that the m protein plays a mechanistic role at the site of virus assembly and budding [28] , and suggest that m is an indispensable component for the replication of virus particles in host cells. in this study, we constructed three shrnas in a plasmid expression system that targeted the m gene and investigated whether shrna-mediated rna interference could inhibit tgev infection of pk-15 cells. virus and cells tgev strain cq was isolated from sick piglets with symptoms of diarrhea [29] and stored in our laboratory. pk-15 cells were grown in high glucose dulbecco minimum essential medium (dmem) supplemented with 10% fetal bovine serum (gibco, usa), 100 iu of penicillin, and streptomycin per ml, at 37°c in a 5% co 2 atmosphere incubator. according to the general principles and guidelines for the design of rna interference, sequences from the m gene of tgev cq were designed based on the ambion's online sirna target design tool to choose the three best target sequences to target the m gene (http://www.ambion.com/ techlib/misc/sirna_finder.html). three theoretically effective sequences at nucleotide positions 103-121 (rnat-1), 358-376 (rnat-2), and 625-643 (rnat-3) were selected. the sequences were analyzed by blast to ensure that they did not have any similar sequences in the swine genome, but share 100% similarity with the published sequences of different tgev strains. these three sequences are listed in table 1 . all the sequences were arranged in the following alignment: bamhi ? sense ? loop ? antisense ? termination ? hindiii. we designed the double-stranded oligo dna hairpin structures to target the m after annealing. all the shrnaexpressing plasmids were diluted with tris-edta buffer to a final concentration of 1 lg/ll. the annealing reaction system (25 ll) comprised 5 ll of shrna sense template, 5 ll of antisense template, and 15 ll of ddh 2 o. the mixture was heated to 95°c for 5 min, cooled to 50°c for 30 s, and then incubated at 4°c. the annealed shrna dna sequences (rnat-1, rnat-2, and rnat-3) and shrna expression vector, prnat-u6.1/neo (ribobio, china), were then double digested with bamhi and hin-diii, and inserted into bamhi-hindiii digested prnat-u6.1/neo to yield prnat-1, prnat-2, and prnat-3, respectively. after transformation of escherichia coli dh5a competent cells to obtain the recombinant plasmids, the positive clones were identified by pcr and sequencing analysis. the enhanced green fluorescence protein fusion gene in the plasmids was used as a reporter during the transfection efficiency analysis. one day before transfection, 3 9 10 5 pk-15 cells were seeded into six-well plates and incubated for 24 h at 37°c in a 5% co 2 atmosphere without antibiotics. when the cells reached 50-70% confluence, they were washed with 0.1 m pbs (ph 7.4) three times and overlaid with transfection complexes containing 2.5 lg of prnat-1, prnat-2, prnat-3, or prnat-nc in 125 ll of dmem medium mixed with lipofectamine tm 3000 (invitrogen, usa), according to the manufacturer's instructions. the transfection complexes were completely removed after incubating for 24 h, and the medium was replaced with 2% fbs containing 600 lg/ml g418. after maintenance for 15 d in selection media, resistant cell clones were selected, cultured, and infected with 0.1 moi of tgev per well in six-well plates. non-transfected cells were used as a virus genes (2017) 53:226-232 227 control. cell transfection efficiency and cpe images were captured under an inverted fluorescence/phase-contrast microscopy (nikon, japan). shrna-transfected cells were collected 48 h after viral infection, subjected to three freeze-thaw cycles, serially diluted tenfold from 10 -1 to 10 -10 , and added to 96-well plates. each dilution was added to eight wells. the tcid 50 was calculated using the reed and muench method. to quantify the effect of shrna on viral replication at 48 h post viral infection, total rna was extracted from pk-15 cells using the rnaiso plus (invitrogen, usa) reagent, according to the manufacturer's instructions, and reverse transcribed into cdna using the goscript tm reverse transcription system (promega, usa), also according to the manufacturer's instructions. quantitative real-time pcr (qpcr) analysis was performed to amplify m gene using the cdna as the template and the b-actin gene as the internal standard. western blotting pk-15 cells were transfected with prnat-nc, prnat-1, prnat-2, or prnat-3 and infected with tgev. cells as well as virus particle were lysed in phosphate buffered saline (pbs), and the total proteins were separated using 12% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (sds-page) and transferred onto a polyvinylidene difluoride membrane. the membranes were incubated with rabbit anti-tgev polyclonal primary antibodies (1:50 dilution, 4°c, overnight), washed, and then incubated with hrp-goat-anti-rabbit secondary antibody (1:5000 dilution, room temperature, 2 h). effects of shrna transfection pk-15 cells (3 9 10 5 cells per well) were plated in six-well plates and transfected with shrna recombinant plasmids (prnat-1, prnat-2, or prnat-3) and empty plasmid (prnat-nc), separately, for 24 h, before being examined by fluorescence and phase-contrast microscopy. the gfp gene expressed the green fluorescent protein from the cmv promoter, and more green fluorescent excitation by the blue wavelengths was observed in cells containing the empty plasmid (prnat-nc) compared with cells transfected with the recombinant plasmids (prnat-1, prnat-2, prnat-3). the normal pk-15 cells showed no fluorescence (fig. 1) . the results showed that shrna recombinants were transfected into pk-15 cells successfully and that stably transfected cell lines were created. the transfection efficiencies were similar among the three recombinant plasmids, while that of the empty plasmid was higher. to study the tgev-induced cpe, pk-15 cells were infected with tgev at 0.1 moi. the virus infected cells (mock control) and empty plasmid (prnat-nc) exhibited obvious morphological changes at 48 h post-infection, including cells shrinkage, turn round, and detachment, in contrast to the non-infected cells (normal) that remained tightly stuck to the plate and maintained their shape. as shown in fig. 2 , the normal group grew well; however, the cells harboring the shrna-expressing plasmids prnat-2 and prnat-3 showed small patches of cpe, such as rounding, shrinking, and morphological changes of the cells, as well as shedding from the brim of the wells. interestingly, the cells harboring recombinant prnat-2 and prnat-3 were mostly capable of resisting the cpe as shown by the observation that the cells attached well and had reduced areas of cpe, which contrasted with the large area of severe cpe in the cells harboring prnat-1. these results indicated that shrna-expressing plasmids prnat-2 and prnat-3 inhibited tgev-induced cpe to a certain degree and could relieve the specific cytopathic effect compared with the controls. to investigate the inhibition of tgev replication by the shrnas, virus titers in pk-15 cells were calculated by the reed-muench method. figure 3 shows that the titers of tgev reached 10 4.74 , 10 3.42 , and 10 3.67 tcid 50 /ml at 48 h post-infection in cells harboring prnat-1, prnat-2, and prnat-3, respectively. the titers at 48 h postinfection corresponded to 3.4-, 70.8-, and 39.8-fold reductions, respectively, compared with that of prnat-nc. the tgev titer was 10 5.71 tcid 50 /ml in cells receiving no plasmid (mock) transfection, which was higher than the titer of 10 5.27 tcid 50 /ml in cells pretransfected with prnat-nc. there was a significant difference between prnat-2 and prnat-nc (p \ 0.01), as well as between prnat-3 and prnat-nc (p \ 0.05). contrastingly, there was no significant difference between prnat-1 and prnat-nc. these data indicated that prnat-2 and prnat-3 resisted tgev infection by reducing the levels of progeny virus production significantly in pk-15 cells. in addition, prnat-2 and prnat-3 showed partial virus infection inhibition, with prnat-1 being the least effective shrna. the expression levels of the m gene in pk-15 cells treated with different interfering plasmids were examined using qpcr. figure 4 shows the cellular expression of the m gene. when the cells were transfected with prnat-1, the expression of m gene decreased by 13% compared with the cells transfected with prnat-nc. when the cells were transfected with prnat-2 or prnat-3, the expression of the m gene decreased by 68 and 70%, respectively, compared with cells transfected with prnat-nc. the results showed that prnat-2 and prnat-3 have a certain inhibitory effect on the proliferation of tgev in pk-15 cells, which is caused by degradation of the viral rna. to further investigate the levels of viral proteins in cells transfected with shrna plasmids and infected with tgev, the levels of viral proteins were assessed using western blotting. equal amounts of cell lysates from tgev-infected and mock-infected pk-15 cells at 48 h were examined using positive anti-tgev serum. figure 5 shows that the amount of viral protein recovered from cells transfected with prnat-2 or prnat-3 was reduced, while the amount of viral protein recovered from cells transfected with prnat-1 was similar to that recovered from cells without an interfering plasmid, which was consistent with the qpcr analysis. rnai has been used widely to silence target genes in mammalian and human cells [30] [31] [32] . rnai can regulate specific gene expression and is closely related to anti-virus replication. rnai has an excellent prospect to improve the shortage of traditional anti-virus vaccines or related inhibitors. rnai has emerged as a potentially important therapeutic antiviral strategy [8, [33] [34] [35] . recently, several kinds of animal viruses, such as porcine reproductive and respiratory syndrome virus [36, 37] , newcastle disease virus [38] , classical swine fever virus [39] , porcine circovirus [40] , infectious bursal disease virus [41] , and porcine hemagglutinating encephalomyelitis virus [42] have been silenced effectively, and most of these viruses are rna viruses. tgev is a porcine coronavirus with an rna genome; therefore, it should also be sensitive to rnai [43] . several studies have reported the application of rnai against tgev replication. effective suppression of tgev infection in swine testicular (st) cells was achieved using dna-based vectors expressing sirnas or shrnas targeting the rna-dependent rna polymerase gene of tgev [8, 15] . lei he, et al. reported the effective inhibition of tgev infection in st cells or pk-15 cells using dna-based vectors expressing an shrna targeting the transcription of tgev gene 7 (a non-structural gene) [4, 44] . however, there is no report showing that sirna/ shrna targeting the m gene of a coronavirus could efficiently inhibit viral infection. in this study, we constructed three shrnas plasmid expression systems to target the m gene and investigated whether shrna-mediated rna interference could inhibit tgev infection in pk-15 cells. our results demonstrated that the infection of tgev in cell culture could be disrupted by shrnas targeting the m gene of tgev: two of the three shrnas generated from the m gene of tgev blocked viral infection efficiently. the cpe and tcid 50 assays revealed that cells transfected with prnat-1, prnat-2, and prnat-3, harboring three sequencespecific shrnas, could trigger inhibition of tgev infection at 48 h post-infection; prnat-2 in particular showed markedly suppression. western blotting and qpcr analyses further confirmed that the efficient inhibition of viral infection was caused by viral degradation. however, the qpcr analysis showed that transfection with prnat-2 and prnat-3 inhibited viral infection by the equivalent of 70%. the qpcr analysis and western blotting assays also demonstrated that, compared with the mock control, the amount of viral rnas in the prnat-1 group decreased a little, which suggested an inefficient inhibitory effect, which possibly indicated that the prnat-1 sequence results in non-specific inhibition or in 'off -target' effects. overall, the variability of viral suppression could be related to the following two aspects. one is that the regulation of rna transcription and protein expression is a very complex process, and represents the combined effect of various factors. the other possible explanation is the difference in the sensitivity and accuracy between tcid 50 and qpcr. qpcr is highly sensitive to detect the suppression effect of rna interference. in addition to the potent inhibition shown by two sequence-specific shrnas, the tcid 50 and qpcr analyses also demonstrated that, compared with the mock control, the amount of viral rnas in the negative control prnat-nc cells also decreased a little, which suggested a non-specific effect on tgev replication in pk-15 cells. similarly, other researchers have discussed an ''off-target'' effect induced by sirna or shrna in their reports. lu et al. [45] found that the non-specific effect was positively related to the concentration of the shrnas. overall, compared with the low-efficiency inhibition and 'off-target' effects of prnat-1, the other two sequencespecific shrnas exhibited the potential to silence tgev rnas. in conclusion, our results indicated that shrnas targeting the m gene in tgev genome could effectively block infection of tgev in pk-15 cells. this finding showed that shrnas could represent a potential novel tool against tgev infection. these results also provided an insight into the inhibition of tgev infection by targeting the m gene. taken together, the present data and the known advantages of shrna technology suggest that shrna represents a candidate agent for tgev therapeutic applications. the coronaviridae proc. natl. acad. sci. usa 98 key: cord-259398-s8qsjkj2 authors: chouljenko, vladimir n.; kousoulas, konstantin g.; lin, xiaoqing; storz, johannes title: nucleotide and predicted amino acid sequences of all genes encoded by the 3′ genomic portion (9.5 kb) of respiratory bovine coronaviruses and comparisons among respiratory and enteric coronaviruses date: 1998 journal: virus genes doi: 10.1023/a:1008048916808 sha: doc_id: 259398 cord_uid: s8qsjkj2 the 3′-ends of the genomes (9538 bp) of two wild-type respiratory bovine coronavirus (rbcv) isolates lsu and ok were obtained by cdna sequencing. in addition, the 3′-end of the genome (9545) of the wild-type enteric bovine coronavirus (ebcv) strain ly-138 was assembled from available sequences and by cdna sequencing of unknown genomic regions. comparative analyses of rbcv and ebcv nucleotide and deduced amino acid sequences revealed that rbcv-specific nucleotide and amino acid differences were disproportionally concentrated within the s gene and the genomic region between the s and e genes. comparisons among virulent and avirulent bcv strains revealed that virulence-specific nucleotide and amino acid changes were located within the s and e genes, and the 32 kda open reading frame. coronaviruses are important etiological agents of human and animal diseases including respiratory infection, gastroenteritis, hepatic and neurological disorders as well as immune-mediated disease such as feline infectious peritonitis, and other persistent infections (1, 2) . enteric bovine coronaviruses (ebcv) are generally associated with enteric disease of newborn calves and winter dysentery of adult cattle (2) . recently, numerous respiratory bovine coronaviruses (rbcv) were isolated in our laboratory from cattle arriving with fever and respiratory disease in feedlots or livestock shows of 8 different states in the usa. the cytopathogenic, cell fusion, and other phenotypic properties of these viruses were different from the known ebcv (3). coronaviruses contain a single stranded, capped, and polyadenylated positive-sense (infectious) rna molecule of approximately 30 kb length, which directs the synthesis of a nested set of subgenomic mrnas (4, 5) . the 3 h -end of the genomic rna consists of approximately 9.5 kb and contains the spike (s) glycoprotein, the hemagglutinin-esterase (he) glycoprotein, the integral membrane (m) protein, the small membrane protein (e) and the phosphorylated nucleocapsid (n) protein and a number of orfs potentially encoding non-structural proteins (n s ) (5) . the 32 kda non-structural protein is a phosphoprotein that accumulates in the cytoplasm of infected cells (6, 7) . it is not known whether the 12.7 and 4.8 kda orfs are expressed in infected cells, while the 4.9 kda putative protein, most likely, is not translated (8) . bcv uses n-acetyl-9-o acetyl neuraminic acid as receptor determinant to initiate infection (9) . although the he glycoprotein also has an af®nity for 9-o-acetylated sialic acid, the s glycoprotein was identi®ed as the major sialic acid binding protein of bcv (10) . the s glycoprotein facilitates viral attachment to susceptible cells, causes cell fusion after cell-surface expression (fusion from within), and induces viral infectivity neutralizing antibodies (1). porcine transmissible gastroenteritis virus (tgev) strains were isolated which exhibited respiratory tissue tropism. these viruses contained point mutations or deletions within the ®rst 250 aa of tgev s1 which were associated with reduced enteropathogenicity and loss of hemagglutinating activity (11±13). to examine the genetic basis for the phenotypic differences between rbcv and ebcv, we cloned and sequenced the 3 h -end of the viral genomes of two virus strains rbcv-lsu-94lss-051-2(lsu) and rbcv-ok-0514-3(ok) that originated from louisiana and oklahoma cattle, respectively. we report here, the nucleotide and predicted amino acid sequences of all genes encoded by the 3 h genomic portion (9.5 kb) of two wild-type rbcv strains and comparisons among respiratory and enteric coronaviruses. viruses and cell line. all rbcv and ebcv strains were propagated in the g clone of human rectal tumor cells (hrt-18g) developed recently through selection and medium modulation (3) . supernatant¯uids from infected hrt-18g cells were collected and viruses were puri®ed as described (14) . rbcv ok and lsu virus stocks were tested at the third and fourth passages, respectively. ebcv ly-138(ly) virus stocks were prepared at the second passage, while the ebcv-l9 virus strain, derived from the ebcv-mebus strain, had been propagated 80 times in cell cultures. strategy for cdna construction and assembly of the 9.5 kb cdna sequence representing the 3 h -end of different bcv strains. tri reagent from the molecular research center, inc. (cincinnati, oh, usa), was used for total rna extraction. ready-to-go you-prime first-strand beads from pharmacia biotech inc. (uppsala, sweden) were used for cdna library construction. all ampli®cations were performed using the gene-amp pcr system 9600 (perkin-elmer, norwalk, ct, usa) with pcr reagents and amplitaq from perkin-elmer. the tacloning kit from invitrogen inc. (san diego, ca) was used for cloning of rt-pcr products. restriction enzymes were obtained from new england biolabs (beverly, ma, usa). the 3 h genomic end of bcv mebus strain consisting of 8695 nucleotides was assembled from available sequences deposited in genbank. the accession numbers of the cdna sequences used to assemble the ebcv mebus genome were m31053 for s and he genes, m31054 for the 4.9, 4.8, 12.7 and 9.5 kda (e) orfs, and m16620 for the m, n genes and i orf. the assembled mebus genomic sequence did not contain the 32 kda orf. an 850 nt sequence containing the 32 kda orf of bcv-quebec (accession number x15445) was used for comparisons with other bcv. the s and he cdna sequences speci®ed by ly-138 were previously reported (15, 16) . a series of overlapping cdna clones representing the entire 3 h -end of two rbcv isolates and unpublished sequences of ly-138 were constructed. two cdna libraries were produced, a library was made using the bcv3 h primer representing the 3 h terminus of the genomic rna, and a second cdna library was produced using an oligonucleotide (3b11) to prime cdnas starting at nucleotide 6345 (counting from the 3 h -end of the viral genome) (fig. 1) . the entire 9545 nt sequence representing the 3 h -end of the bcv genome was divided into six overlapping cdna regions. each cdna was ampli®ed by pcr using speci®c primer pairs. primer pair 5f6/bcv 3 h ampli®ed a cdna fragment containing the m and n genes. primer pair 5f5/3b3 ampli®ed a cdna fragment containing the 3 h -end of s, 4.9 kda, 4.8 kda, 12.7 kda, e, m and the 5 h -end of n genes. primer pair b5 h /b3 h ampli®ed the 3 h -end of the spike gene, primer pair 5f24/a3 h ampli®ed a cdna that coded for the carboxy-terminal portion of the s1 subunit, primer pair a5 h /3b11 ampli®ed a cdna fragment that coded for the amino-terminus of s, and primer pair 5f16/3b10 ampli®ed the 32 kda and he genes. b3 h and a5 h primers were designed to contain an extra bamhi and ecori sites for cloning purposes, while a bstxi site was naturally present in the 5f24 primer. the actual primer sequences are: dna sequencing and analyses. dna sequencing was carried out with the modi®ed dideoxynucleotide chain termination procedure (17) overall comparisons of genes and predicted proteins speci®ed by rbcv (lsu, ok) and ebcv (ly, mebus) strains. to establish the close evolutionary relationship between lsu and ok strains and to ascertain rbcv-speci®c amino acid changes (conserved in lsu and ok but different in other strains), a pairwise comparison of nucleotide and amino acid differences among bcv strains for all orfs, except for the orf coding for the rna-dependent-rna-polymerase and the 32 kda protein was performed (table 1 ). in general, the nucleotide and amino acid sequences of rbcv strains lsu and ok were more conserved to each other than to ebcv strains ly-138 and mebus, and they were more divergent to the mebus strain than to the ly-138. speci®cally, the amino acid sequence of m speci®ed by lsu and ok were identical, while they were different by one, and two aa from that of mebus and ly-138, respectively. the s glycoprotein speci®ed by lsu differed by only 4 amino acids from that of ok, while s glycoproteins of lsu and ok differed by 22 and 33 amino acids in comparison with the ly and mebus s sequences, respectively. furthermore, lsu and ok sequences of the n and i orf (located within n) were more conserved to each other than to any other strain compared. most amino acid changes within he, 4.9, 4.8, 12.7 kda orfs, e, n and the i orf were strain-speci®c. he and m contained one rbcv-speci®c aa change, and n and i orf contained two rbcv-speci®c aa changes each. rbcv-speci®c amino acid substitutions within s. the s1 subunit contained most of the rbcv-speci®c aa substitutions and included an amino acid change within the signal sequence as well as two clusters of amino acid substitutions within the amino-terminus and the hypervariable region (fig. 2) . the proteolytic cleavage site that separates s1 and s2 subunits was conserved among rbcv, ly-138 and mebus strains. in contrast to the high number of rbcv-speci®c substitutions within s1, s2 contained only two rbcv-speci®c amino acid changes, an ala 769 to ser change immediately adjacent to the proteolytic cleavage site and an asp 1026 to gly located within the heptad repeat sequence. the rbcv-g95 strain was isolated from a nasal sample of a calf that had diarrhea and signs of respiratory distress (18) . the nucleotide and predicted primary structures of s and he glycoproteins speci®ed by rbcv-g95 were reported previously (19, 20) . lsu and ok had ten unique aa substitutions within s in comparison to all other bcv strains, while g95, lsu, and ok shared only three aa substitutions at aa 100, aa 465, and aa 1026 (fig. 2) . rbcv-speci®c nucleotide and amino acid substitutions within the 4.9, 4.8, and 12.7 kda orfs. the human coronavirus strain oc43 (hcv-oc43) lacks two orfs which potentially encode two nonstructural proteins of 4.9 and 4.8 kda (21) . furthermore, the same genomic areas are deleted in three hemagglutinating encephalomyelitis virus (hev) strains of swine (22) . the fact that respiratory hcv-oc43 and ebcv strains show remarkable genomic and protein similarities as well as immunological cross-reactivities, prompted us to compare the nucleotide sequences speci®ed by the genomic region between the s and the 12.7 kda orf of ebcv, rbcv, hcv-oc43, and three porcine hev strains (fig. 3 (fig. 4) . identical aa changes were also found in the 32 kda protein of two more rbcv strains isolated from texas and arizona cattle (data not shown). genetic comparisons among different bcvs revealed substantial differences between rbcv and ebcv strains principally within the s gene and within orfs located between the s and e genes. furthermore, genetic differences between virulent and avirulent strains were identi®ed within the s gene, the e gene and the 32 kda orf. the salient features of genetic differences between rbcv and ebcv strains are discussed below: rbcv-speci®c genetic alterations in the s gene. a pairwise alignment of tgev and mhv s aa sequences revealed that the n-terminal portion of s1 which is deleted in the porcine respiratory coronaviruses (prcv) and hcv-229e, in comparison with tgev, is the region corresponding to the mhv receptor binding-site (aa 1±330) (23) . the tgev receptor-binding site is in a different location (aa 500± 700) and aligns with the s polymorphic region of the mhv strain. recently, it was shown that only two aa changes at the n-terminus of tgev s resulted in the loss of enteric tropism (13) . the s1 amino terminus speci®ed by rbcv strains lsu and ok contained aa changes at aa11, aa115, aa118, aa173 and aa179 which may affect s1-mediated receptor binding. hemagglutination of chicken red blood cells (rbc) was shown to be mediated by the s glycoprotein, because puri®ed s of the ebcv mebus strain agglutinated chicken rbc, while puri®ed he did not (10, 24) . rbcv strains lsu and ok agglutinated mouse and rat, but not adult chicken rbc (3). therefore, aa changes within s speci®ed by rbcv may be responsible for the inability to hemagglutinate chicken rbc. the s1a virus neutralizing (vn) immunoreactive epitope (aa 351±403) (25) was identical for all viral strains, except for a single aa change at aa 362 speci®ed by the avirulent, cell culture-adapted strain ebcv-l9 (fig. 2) . furthermore, the s1a epitopes of hcv-oc43 and the bcv mebus strain were identical (21) . monoclonal antibodies (mabs) against the ebcv mebus cross-reacted with different animal and human coronaviruses (25) . therefore, it is likely that these antibodies react with the s1a epitope. the hypervariable region of the s glycoprotein contains the s1b immunoreactive epitope which is the target for virus neutralizing mabs (25) . four rbcv-speci®c aa substitutions at aa 510, aa 531, aa 543 and aa 578 were located within or proximal to this epitope. based on the observed aa changes, it can be predicted that mabs speci®c for this region may be able to distinguish between respiratory, enteric and vaccine bcv strains. the bcv s2 subunit of s induced cell fusion when it was expressed in insect cells, indicating that s2 contained membrane fusion domains (26) . the hydrophobic and heptad repeat regions of s2 are believed to form the coiled-coil structure of the oligomeric s protein that have been associated with fusion activity. speci®cally, three aa changes within a predicted heptad region of the mhv s2 subunit were shown to be responsible for ph-dependent cell fusion (27) . rbcv strains lsu and ok are highly fusogenic in cell culture (data not shown). additional experimentation is required to assess whether the aa change of ala 769 to ser immediately after the proteolytic cleavage site and the aa change of asp 1026 to gly within the heptad repeat are responsible for the extensive cell fusion induced by rbcv. rbcv-speci®c genetic alterations between the s and e genes. the rbcv genomic regions between the s and e genes contained many nucleotide substitutions, deletions and insertions. hcv-oc43 and three porcine hev strains speci®ed deletions within the 4.9 and 4.8 kda orfs, indicating that they are not essential for virus replication. similarly, the high number of mutations within the rbcv 4.8 and 4.9 kda orfs suggests that these orfs are not essential for virus replication in cell culture ( table 1) . the 65 nt leader of a cloned bcv defective interfering (di) rna when mapped by mutations, could be converted rapidly to the wild-type leader of a helper virus following di rna transfection into helper fig. 2 . comparison of the predicted amino acid sequences of the bcv s glycoprotein speci®ed by different strains. amino acids that are different in at least one strain are shown, except aa 1, aa 768 and aa 1363 which are included as reference points. * indicates unique amino acid changes for each strain. boxed amino acids are common among different strains. light-gray boxes contain rbcv-speci®c, dark boxes contain virulent-speci®c, and clear boxes contain ebcv-speci®c aa changes. aa 1±17 is the putative signal peptide; aa 351±403 is the s1a immunoreactive domain; aa 517±621 is the s1b immunoreactive domain; aa 452±593 is the hypervariable region; aa 955±992 is the hydrophobic region; aa 993±1032 is the heptad repeat sequence; aa 1312±1325 is the carboxy-terminal anchor sequence. virus-infected cells (28) . nucleotide substitutions mapped the crossover region to a 24-nucleotide segment that starts from the last nt of the leader-mrna junction sequence and extends further downstream. the rbcv isolates lsu, ok, as well as rbcv az-26649-2 (az) and tx-671-2 (tx) isolated from california and texas cattle, respectively (data not shown), contained a four nucleotide deletion located within this 24 nucleotide segment (fig. 3) . this deletion may alter the recombination frequency between the leader and the leader-mrna junction sequence immediately upstream of the 12.7 kda subgenomic mrna, and cause either inhibition or enhancement of the putative 12.7 kda transcription and subsequent protein expression. genetic differences between virulent and avirulent bcv strains. the s glycoprotein contained 7 aa substitutions which were common for all virulent strains (aa 33, 40, 248, 470, 965, 1241 and 1341). three mutations within the si portion of s caused conservative aa changes, while one non-conservative aa change of his 470 to asp was located within the s1 hypervariable region. all three mutations within s2 caused non-conservative amino acid changes. amino changes within s1 and s2 may affect the structure and function of the s glycoprotein and alter the pathogenetic potential of these viruses. the 32 kda orf speci®ed by rbcv virulent strains lsu and ok as well as ebcv ly-138 contained two frame-shift mutations which resulted in a 9 aa segment near the carboxy-terminus which was different from the corresponding amino acid sequence speci®ed by the avirulent ebcv-quebec strain (fig. 4) . a similar double frame shift mutation was found in the 32 kda orf speci®ed by hcv-oc43 (29) . these frame shift mutations increased substantially the hydrophilicity of the carboxy terminal portion of the 32 kda protein in virulent versus avirulent strains (data not shown). sequencing of the corresponding region of the 32 kda protein of the avirulent ebcv l9 and mebus strains as well as the 32 kda of other virulent strains will substantiate these differences between virulent and avirulent strains. the aa substitution of gly 53 to val in the e protein was conserved for all bcv virulent strains (f15, ly-138, ok, lsu), hcv-oc43 (30) , and three different porcine hev strains (22) . this mutation may affect the ability of these viruses to invade different tissues, because e is part of the virion and is expressed at infected cell-surfaces (31, 32) . curr top microb & immunol 99, 165±200 corona and related viruses: functional domains in the spike protein of transmissible gastroenteritis virus the coronaviridae: the coronavirus surface glycoprotein the coronaviridae: the coronavirus non-structural proteins nucl acids res 16, 10881±10890 we acknowledge the technical assistance of mamie burrell with cell culture and virus propagation, and galina rybachuck with sequence analysis and design of ®gures. this work was supported in part by usda grant 94-3704-0926 of the national research initiative program to j.s. and k.g.k., louisiana educational support fund (leqsf) grant xrf/1995-1998-rd-b-18 to j.s. and k.g.k., leqsf grant rd-1993-1998-rd-b-04 to k.g.k., and a grant from immtech biologics, inc., bucyrus, ks. we are indebted for support by the lsu school of veterinary medicine. this publication is identi®ed as genelab publication #gl1201. key: cord-348896-a2mjj5dt authors: abid, nabil ben salem; chupin, sergei a.; bjadovskaya, olga p.; andreeva, olga g.; aouni, mahjoub; buesa, javier; baybikov, taufik z.; prokhvatilova, larisa b. title: molecular study of porcine transmissible gastroenteritis virus after serial animal passages revealed point mutations in s protein date: 2010-12-28 journal: virus genes doi: 10.1007/s11262-010-0562-8 sha: doc_id: 348896 cord_uid: a2mjj5dt porcine respiratory coronavirus is related genetically to porcine transmissible gastroenteritis virus with a large deletion in s protein. the respiratory virus is a mutated form that may be a consequence of the gastroenteritis virus’s evolution. intensive passages of the virus in its natural host may enhance the appearance of mutations and therefore may contribute to any attenuated form of the virus. the objective of this study was to characterize the porcine transmissible gastroenteritis virus tmk22 strain after passages in piglets from 1992 until 2007. a typical experimental infection, molecular characterization, and serological analysis were also carried out to further characterize and to evaluate any significant difference between strains. the sequence analysis showed two amino acid deletions and loss of an n-glycosylation site in transmissible gastroenteritis virus s protein after passages in piglets. although these deletions were positioned at the beginning of the antigenic site b of s protein, no clinical differences were observed in piglets infected experimentally either with the native virus or the mutated one. serological tests did not show any antibody reactivity difference between the two strains. in this article, we report that the s protein deletion did not affect the virus’s pathogenicity. the variety of the virus’s evolutionary forms may be a result, not only of the multiple passages in natural hosts, but also of other factors, such as different pathogens co-infection, nutrition, immunity, and others. further studies need to be carried out to characterize the mutated strain. porcine transmissible gastroenteritis virus (tgev) is an enteropathogenic coronavirus [26, 31] . it infects pigs of all ages; however, infection is most severe in newborn piglets, resulting in a fatal diarrhea [27, 33] . the disease outbreaks were observed in many countries such as japan, canada, russia, and causing considerable economic damages in swine industries [15] . the mortality of the newborn piglets can reach 100% [5, 32] . tgev has three major structural proteins: the spike (s), the nucleoprotein (n), and the membrane (m) [18, 39] . protein s is the major inducer of tgev neutralizing antibodies (abs). studies of tgev mutations were enhanced by the detection of porcine respiratory coronavirus (prcv) in the winter of 1983-1984 [25] . this virus was closely related to tgev [5, 8, 36] and the main difference, which was found between the two viruses, was the large deletion in the s gene (621-681 bp for the american strains and 672 bp for the european strains) [9, 22] . as a result, prcv had a deletion of 224 amino acids (european strain) or 227 amino acids (american strains) and the loss of the antigenic sites c and b in the spike s protein [9, 12, 22, 35, 36] . furthermore, some prcv strains differ from tgev by two minor deletions [28, 36, 42] . based on sequence and phylogenetic analysis, sanchez and collaborators proposed that the european prcv strains have been derived by a 672-nt deletion from an enteric tgev and the origin of prcv may be the consequence of tgev evolution by sequence deletion in the s protein [36] . although the nature of the events responsible for the genomic diversity between prcv and tgev remains an open question, both homologous and heterologous rna recombinations, and an accumulation of point mutations were proposed to be a driving force for coronavirus evolution [36] . unlike tgev, prcv replicated in the respiratory tract yet with low extent in the gut. this deletion changed the primary affinity of the virus from the digestive to the respiratory tract. in this respect, mutants of mouse hepatitis virus encoding a point-mutated or a truncated s protein have been shown to be neuron-attenuated for mice [28] . prcv constitutes a good example of genotypic and phenotypic changes as a result of sequence deletion. the infection with prcv induced the production of abs which were able to neutralize both tgev and prcv. as a result, the incidence and the severity of tgev infection have been decreased dramatically since the widespread of prcv infection in swine herds. it was proposed that prcv behaved as a natural vaccine against tgev, which makes the study of its origin and evolution interesting [9] . the mutation of tgev s gene could be enhanced by intensive passages of the virus in its natural host under the influence of many environmental factors. the objective of this study was to carry out a typical experiment to compare two tgev tmk22 strains, one of which was cell culture, adapted and maintained in the laboratory for about 15 years, and the other virus was passed in animals during the same period of time. cell culture adapted tgev tmk22 strain (the collection of microorganism strains of the federal governmental institute, all-russian research institute for animal health, fgi arriah) was used in this study. it was detected in a 9-week-old piglet with diarrhea in a swine farm in the mid 1980 s. the strain was cell culture adapted and propagated in swine testis (st) cells as described previously [31] . these cells were found to be permissive to porcine hemagglutinating encephalomyelitis virus (hev), swine influenza virus (siv), and porcine tgev. in addition, st cells were shown to be not permissive to porcine reproductive and respiratory syndrome virus (prrsv). st cells are almost used for the isolation of tgev. the continuous st cell culture was started from trypsinized testicular tissue from swine fetuses, as described previously [23] . cells were seeded in earle's balanced salt solution with 0.5% lactalbumin hydrolysate (lah) and 10% pig serum. after a period of adjustment, the cells were trypsinized and sub-cultured at weekly intervals. tgev was propagated in st cells with mem medium buffered with 20 mm hepes and 0.2% (w/v) sodium bicarbonate, supplemented with 3% fetal bovine serum, 1% (v/v) antibiotics, and 2 mm l-glutamine. cells were examined daily for cytopathic effects (cpes). when cpe was shown in 80% of cell monolayer, cells and supernatant fluids were frozen and thawed three times to release intracellular virus into the medium. the fluid was clarified by low-speed centrifugation (4009g for 10 min). the virus was then titrated by growth of serial 10-fold dilutions in vero cells. the virus titer was estimated by reed and muench method [29] , and the virus amount per 1 ml volume was calculated. tgev tmk22 strain was passaged annually in 2-weekold piglets during the period from 1992 until 2007. these piglets were sacrificed, and the intestinal contents were taken, clarified, and stored at -80°c until use. this tgev strain was referred in this study to as ptmk22. rna extraction and rt-pcr rna extraction was carried out according to the procedure of gribanov et al. [14] with modifications using gf/f nitrocellulose glass filters (whatman, england), as described previously [1] . the tgev genome was detected using the reverse-transcription polymerase chain reaction (rt-pcr) method, as described in the original article [1] . rt-pcr and dna sequencing were carried out using primers s3 and s4. the primers were located in prcv s gene region, which were able to differentiate prcv and tgev viruses. the nucleotide and the amino acid sequences of tgev tmk22 strains were compared with the corresponding sequences of tgev strains, available in genbank database. sequences were analyzed with the computer program mega version 4.0 [40] . in this study, go-taq polymerase (promega, moscow, russia) was used and showed high fidelity by sequencing of more than 100 pcr products for the detection of porcine epidemic diarrhea virus and porcine rotavirus. nevertheless, dna fragments of ptmk22 strain were sequenced three times to be sure of the given mutations and to exclude any spontaneous mutations introduced by the dna polymerase. to compare the pathogenicity of the two tgev strains, six 1-week-old piglets were used in this typical experiment, whereas four 6-week-old piglets were used to study the neutralizing abs production. before use, all animals were confirmed negative for coronavirus abs by a blocking enzyme linked immunosorbent assay (elisa) (ingezim corona differential, spain). for the first lot (1-week-old piglets): two piglets were inoculated with cell culture adapted tgev tmk22 strain; the next two piglets were inoculated with ptmk22 strain, and the remaining two piglets were used as a control group. the piglets were inoculated orally with 5 ml of cell culture supernatant, containing tgev tmk22 strain or ptmk22 at a titer of 10 8 tcid 50 /ml. the control piglets were inoculated with uninfected cell culture supernatant. for the second lot (6-week-old piglets): one piglet was inoculated with cell culture adapted tgev tmk22 strain at a titer of 10 8 tcid 50 /ml, one piglet was inoculated with ptmk22 strain at a titer of 10 8 tcid 50 /ml, one piglet was inoculated with a mixture of the two strains, and the remaining one piglet was used as control. all piglets were housed individually and fed with a commercial sterile milk substitute. clinical signs and rectal temperature were recorded daily. the stool samples were collected after the manifestation of diarrhea. major organs, including liver, lung, kidney, spleen, and intestine were collected aseptically post mortem for virus genome detection using rt-pcr analysis. the detection of the tgev antigens in stool samples was carried out using a commercial kit (anigenrapid tge ag test kit, south korea) according to manufacturer's protocol. the elisa test is an useful diagnostic method for the differentiation between tgev and prcv viruses [5] . there are several available tests using mabs based on the s protein epitopic differences that have been developed to differentiate tgev and prcv abs [4, 6, 17, 37, 38] . sera were collected before infection and every day during the first 5 days after inoculation, and then samples were taken every 10 days. abs against tgev in serum was established using blocking elisa test (ingezim corona differential, spain). in brief, the coronavirus antigen was fixed to a solid support (polystyrene plate) [2, 30, 41] . serum sample was added in two wells. after incubation, a specific peroxidaseconjugated mab against one common epitope of both coronaviruses was added to one of the wells. if the serum sample contains abs against any one of the viruses, they will not permit the binding of the labeled mab to the antigen, whereas if it does not contain specific abs, the mab will bind to the antigen on the plate. in the other well, a specific peroxidase-conjugated mab against one specific epitope of the tgev was added. the experiment was performed as described above, the different combinations of the results in both wells permitted us to know if the serum sample contains abs against tgev or against prcv, or do not contain abs against either of them. the presence of anti-tgev abs was considered confirmation that the tgev infection had succeeded. after elisa testing was complete, sera were subsequently heated at 56°c for 30 min and evaluated for virus neutralizing ab against porcine coronaviruses using standard microtiter virus neutralizing assay (vn). serial 2-fold dilutions of sera were made in 96-well microtiter plates. tgev strain was added to each well (approximately 100 tcid 50 ). after 1 h of incubation at 4°c, st cells were added, and the plates were incubated at 37°c in a 5% co 2 atmosphere for 3-4 days. each well was then examined for cpe. the antibody titer was determined to be the dilution of sera where 50% of the wells were infected. positive and negative controls were included in each reaction. results were presented as the reciprocal of the highest dilution of the test sample capable of neutralizing the virus. sequence analysis of tgev ptmk22 strain revealed a deletion of six nucleotides in the s gene fragment. the deduced amino acid sequence of ptmk22 strain revealed two amino acid deletions in two positions 97 and 98 of the s protein and two amino acid substitutions in position 92 (threonine by lysine) and 94 (asparagine by threonine) resulting in a loss of the glycosylation site upstream the b antigenic site (fig. 1) . although no amino acid deletion was detected in ptmk22 s protein (data not shown), virus may undergo other nucleotide changes elsewhere in the genome. the nucleotide sequence of the amplified s gene fragment of the mutated tgev was deposited in genbank under accession number gq907020. to further characterize the mutated strain, typical experimental infections using tmk22 and ptmk22 strains were carried out. clinical signs were reproduced successfully, and the virus was recovered from the fecal samples of the infected newborn piglets. overall, the clinical signs observed in piglets, infected with the cell culture adapted tmk22 and ptmk22 strains, were consistent with a typical tgev infection and did not show any clinical difference in these typical experiments. for the 1-week-old piglets: clinical signs showed loss of appetite, vomiting, a yellowish diarrhea with smell of foul steatorrhea due to maldigestion and depression 24 h post infection (hpi). infected piglets developed hyperthermia (rectal temperature [40°c) 36 hpi. severe dehydration and death were observed 5 days post infection (dpi). post mortem, the main pathological findings were the gas-filled stomach and intestine, coagulated transparent milk in the intestine, intestinal swelling, and congestion. the control piglets did not exhibit clinical signs consistent with tgev infection (table 1) . for the 6-week-old piglets: clinical signs were less severe. these piglets were less susceptible to death in comparison to the newborns. the clinical signs were depression, loss of appetite, and diarrhea 6 dpi. rectal temperature increased for the first 6 dpi and it returned to its normal levels. in contrast to 1-week-old piglets, all the 6-week-old piglets recovered 7 dpi. the kinetic of these clinical signs were shown to be the same for all piglets infected either by tmk22, ptmk22, or tmk22/ptmk22 strains. no clinical signs were shown for the non infected piglets (controls) ( table 1 ). as shown in table 2 , the genome of tgev was detected by rt-pcr in stool samples of the all 1-week-old infected piglets either by tmk22 or ptmk22 strains 24 hpi. the isolation of tgev in cell culture failed and was not possible before 48 hpi. the detection of tgev antigen using immunochromatographic test strip was detected in stool samples of the infected piglets before 48 hpi. in the acute phase of infection, tgev neutralizing abs were not detected by virus neutralization assay. all the infected piglets were dead by the virus 5 dpi. no antibody response was seen for the non infected piglets. post mortem, tgev genome was detected in intestine, lung, kidney, spleen, and liver samples of the infected piglets either by tgev tmk22 or ptmk22 strains. vn test was carried out using antisera taken from the 6-week-old infected piglets. blood samples were taken daily after infection to monitor the tgev abs level. table 3 , the highest ab titer was detected during a period from 20 to 40 dpi, and then the titer slightly declined until 50 dpi. although tgev abs were not detected 60 dpi, it may present in serum at low level. in general, tgev ab titer in infected animals during the acute phase of infection was less than this titer in the infected animals during the convalescent phase [3] . the infected piglets remain healthy after the typical experiment. the infection has produced adequate immunity due to the ability of viruses (tmk22 and ptmk22) to infect the intestinal tract and consequently stimulate b cells for the production of immunoglobulin class a (iga) [3] . in addition, cell-mediated immunity plays a direct role in the protection and the recovery from infection, and the production of abs is regulated by various cytokines derived from activated mononuclear cells during the immune response [34] . although the use of 6-week-old piglets is limited to study the neutralizing ab production, stool samples were tested for the presence of viral genome. the same results were obtained for the two virus strains (data not shown). genetic variability has been observed for all rna viruses examined, and their potential for rapid evolution is increasingly recognized as the basis of their ubiquity and adaptability [16, 19] . the molecular mechanisms underlying rna virus variations are: mutation, homologous and non homologous recombinations, and genome reassortment in viruses with a segmented genome such as reoviruses. the genetic evolution of viruses is an important aspect of the epidemiology of viral diseases and sometimes causes problems in the development of successful vaccines. however, for some viruses, such evolutionary behavior may generate new variations in favor of their natural host as the case of tgev and prcv (in favor of pigs) where abs produced after primary infection with prcv may prevent pigs to die after infection with tgev. the incidence and the severity of tgev infection have decreased dramatically in the world since the widespread of prcv infection. however, tgev outbreaks still occur at several prcv antibody-positive farms [10] . such report enhances investigators to use tgev as a model to study gastroenteritis infections. experimental studies provided further signs of the ability of extensive virus passages in animals to generate new mutants and new variants [7] . however, the degree of tgev mutation in infected pigs over time is not known. to address this question, we examined the genetic and antigenic changes that occurred in tgev tmk22 strain using pcr and dna sequence analysis, for viral rna variability study, we can detect and characterize recombination events with extreme precision [11, 13, 20, 21, 24] . in this study, we have focused on nucleotide deletion that occurred in tgev tmk22 strain after passages on animals for a long period of time resulting in two amino acid deletions in s protein. although the homologous rna recombination between virus genomes (tmk22 and ptmk22) in piglets infected experimentally was not proved, it is not excluded. it is possible that rna recombination among virus particles of the same strain occurs naturally and under experimental conditions. our results showed no difference in pathology caused by either virus (tmk22 and ptmk22) in either piglet population (1-week-old and 6-week-old piglets) in these typical experiments. although no clinical differences were observed in piglets infected experimentally with native and mutated strains, tgev s gene showed some degrees of change. furthermore, the mutated strain undergoes a modified n-glycosylation site upstream the antigenic site b. it has been shown in other studies that site b is fully dependent on glycosylation for proper folding [10] . the loss of n-glycosylation at the beginning of the b antigenic site needs to be further investigated by the analysis of the in silica 3d protein modeling of the s protein. further investigation needs to be undertaken to analyze host-pathogen interaction by studying protein-protein and protein-rna interactions, and experimental co-infection of piglets with gastroenteritis pathogens to better characterize the tgev mutated strain and to explore any interference phenomena. according to results of this study, we cannot suggest that tgev s gene mutations is responsible for changing any biological function (host-pathogen interactions) of the virus unless we carry out further biochemical, structural, and proteomic analysis. nd not detected the coronaviridae pathogenesis of transmissible gastroenteritis of swine viral infections of the gastrointestinal tract viral diarrhea of man and animals diseases of swine diseases of swine we are grateful to dr. nikolay zinyakov from viral molecular diagnostics laboratory of avian diseases for sequencing the dna samples. key: cord-269720-o81j3d1j authors: page, kevin w.; britton, paul; boursnell, michael e. g. title: sequence analysis of the leader rna of two porcine coronaviruses: transmissible gastroenteritis virus and porcine respiratory coronavirus date: 1990 journal: virus genes doi: 10.1007/bf00570024 sha: doc_id: 269720 cord_uid: o81j3d1j the leader rna sequence was determined for two pig coronaviruses, tranmissible gastroenteritis virus (tgev), and porcine respiratory coronavirus (prcv). primer extension, of a synthetic oligonucleotide complementary to the 5′ end of the nucleoprotein gene of tgev was used to produce a single-stranded dna copy of the leader rna from the nucleoprotein mrna species from tgev and prcv, the sequences of which were determined by maxam and gilbert cleavage. northern blot analysis, using a synthetic oligonucleotide complementary to the leader rna, showed that the leader rna sequence was present on all of the subgenomic mrna species. the porcine coronavirus leader rna sequences were compared to each other and to published coronavirus leader rna sequences. sequence homologies and secondary structure similarities were identified that may play a role in the biological function of these rna sequences. transmissible gastroenteritis virus (tgev) and porcine respiratory coronavirus (prcv) belong to the family coronaviridae, a large group of pleomorphic enveloped viruses with a positive-stranded rna genome. tgev causes gastroenteritis in pigs, resulting in a high mortality in neonates (1) . prcv was isolated in several european countries between 1984 and 1986 (2-4), does not cause diarrhea, and has been shown to replicate in the respiratory tract with little or no clinical signs, but is very similar antigenically and serologically to tgev (2, 4) . virions from both viruses contain two envelope glycoproteins of relative molecular mass (mr) 200,000 (spike) and m r 28,000-31,000 (membrane protein) and a phosphorylated nucleoprotein of m r 47,000. cdna probes to the structural protein genes of tgev hybridized to the appropriate mrna species of prcv, suggesting a high degree of homology at the rna level (unpublished data). coronavirus proteins are expressed from a "nested" set of subgenomic mrnas with common 3' termini but different 5' extensions. the sequence of each mrna that is translated to produce viral proteins appears to correspond to the 5'-terminal region that is absent on the preceding smaller mrna species. it has been shown for the coronaviruses, mouse hepatitis virus (mhv) and infectious bronchitis virus (ibv), the subgenomic mrna species possess short "leader sequences" at their 5' ends. these sequences are not transcribed as a contiguous mrna species, but are derived from the 5' end of the genomic rna and are probably joined to the 5' end of each mrna by a process of discontinuous transcription (5) (6) (7) (8) (9) . the leader sequence appears to be produced by a mechanism termed leader-primed transcription, in which the leader rna is transcribed independently, dissociated from the template, and then binds to the template (negative-sense strand) at specific transcriptional start sites (i0, 11) . the mechanism appears to involve the recognition of consensus sequences identified on the genomic rna at those points corresponding to the 5' ends of the subgenomic mrnas. these consensus sequences may act as a binding site for the rna polymeraseleader complex (7) (8) (9) (12) (13) (14) . it has been previously postulated that a heptameric sequence, actaaac (15) (16) (17) , or a hexameric sequence, ctaaac (18) (19) (20) , may be involved in the binding of the tgev rna polymerase leader. in this paper we describe the elucidation of the leader rna sequences from the porcine coronaviruses tgev and prcv, the first leader sequence to be described from the tgev serogroup of coronaviruses. comparison of the leader rnas of tgev and prcv with published leader rnas of other coronaviruses was used to identify areas of conserved sequence and potential secondary structure that may be involved in the transcription of coronavirus subgenomic mrna species. confluent cultures of a pig kidney cell line llc-pk1 were infected with a virulent british field isolate of tgev strain fs772/70 or a british isolate of prcv strain 86/137004 at a moi of 1-10 pfu per cell. after 2 hr at 37~ the inoculum was removed and replaced with medium containing 1 ixg/ml actinomycin d to inhibit host-cell rna synthesis (21) . after a further 2-hr incubation, 25 r of [5,6-3h]uridine (amersham international plc, trk.410, 35-50 ci/mm) was added per culture bottle and the cells were incubated for a further 5 hr. the cells were lysed with guanidinium thiocyanate, the rna pelleted through 5.7 m cesium chloride and poly(a)-containing rna isolated by poly(u) sepharose affinity chromatography, as described previously (21) . two oligonucleotides were synthesized by the phosphoramidite method using an applied biosystem 381a synthesizer. one oligonucleotide, oligo 38 (5'-tggatt-catccccccaacta-y), was complementary to the nucleoprotein gene 22 bp downstream from the initiation atg codon (15) , as shown in fig. 1 , and was used for primer extension. the second oligonucleotide, oligo 58 (5'-agagata-tagccacgctacactcactttac-y), was complementary to the 5' end of the leader rna ( fig. 1) and was used for northern blot analysis of viral mrna. gel-purified oligo 38 (500 ng) was 5'-end-labeled (22) using 20 u of t 4 polynucleotide kinase (gibco-brl, paisley) and 20 ixci [~/-32p]atp (amersham international plc, pb 10168, 3000 ci/mm. poly(a)-containing rna (1.5 p~g) isolated from tgev-and prcv-infected cells was resuspended in water and heated at 60~ for 3 min. a further incubation was carried out using the two mrna preparations in 27 p.l reaction volumes containing 40 u of rnasin (promega biotec, liverpool), 50 mm tris-hc1 (ph 8.3), 10 mm mgc12, 35 mm kc1, 30 mm 2-mercaptoethanol, 3 mm dithiothreitol, 4 mm dntps, 5'-end-labeled oligo 38 (120 ng), and 21 u of amv reverse transcriptase (super-rt, anglian biotech ltd, colchester) for 90 min at 42~ formamide dye (80% formamide, 10 mm naoh, 1 mm edta, 0.1% xylene cylanol blue, 0.1% bromophenol blue) was added and the mixture boiled for 3 min and electrophoresed on a 40 cm buffer gradient sequencing gel (23) . the wet gel was autoradiographed for 1 hr to locate the primerextended products, which were excised from the gel. the labeled fragments were eluted from the polyacrylamide gel and chemically cleaved (24) . samples of the cleaved products from each of the primer extended products were electrophoresed on 6% polyacrylamide gels at 35 w constant power for two different lengths of time. tgev and prcv poly(a)-containing rna was glyoxylated and separated on a 1% agarose gel (22) . the rna was transferred onto biodyne a membranes (pall p/n bnng3r 1.2 ~m, gallenkamp) in x20 ssc (x1 ssc = 0.15 m naci, 0.015 m trisodium citrate, ph 7.0) for 18 hr and baked at 80~ for 2 hr. the membrane was boiled in 50 mm tris-hcl ph 8.0 for 5 min to remove glyoxal groups from the rna and prehybridized in the presence of 50% formamide for 6 hr at 42~ (15) . the viral mrna species were hydribidized with 32p-labeled oligo 58 in the presence of 50% formamide for 18 hr at 42~ the membrane was washed four times in x2 ssc containing 0.1% nadodso 4 for 15 rain at room temperature and autoradiographed. following primer extension, using oligo 38 at the 5' end of the nucleoprotein gene from the porcine coronaviruses tgev and prcv, labelled fragments of approximately 140 bases were produced and purified from gels. larger molecular weight species were also observed (data not shown) in minor amounts, presumably corresponding to read-through sequences upstream of the nucleoprotein gene primed from the larger mrna species. the nucleotide sequences of the two fragments, determined by chemical cleavage, were identical. the resulting nucleotide sequence of the tgev leader rna sequence is shown in relation to the tgev nucleoprotein gene in fig. 1 . the leader rna sequence diverges from the genomic sequence 15 bp upstream of the nucleoprotein gene, corresponding to the first nucleotide of the membrane protein gene stop codon (16), indicating a length of 91 nucleotides of unique sequence (fig. 1 ). the 91 nucleotide leader sequence of tgev and prcv has a low content of g (18%) and c (20%), and a high a (22%) and t (40%) content, with 20% of the t residues grouped in threeto four-nucleotide motifs (fig. 1) . these values are similar to those observed from the tgev genome so far sequenced, except that the values for a (30.5%) and t (32.1%) are more similar on the genome than on the leader sequence. analysis of the tgev nucleoprotein nucleotide sequence (15) revealed a potential rna polymerase-leader complex binding site. the site, actaaac, is seven nucleotides upstream of the nucleoprotein initiation codon and has also been found to precede all the tgev structural protein genes and two of the three potential genes shown to be at the 5' end of mrna species (15) (16) (17) . this consensus sequence is found two nucleotides downstream of the nucleotide where the leader rna and tgev genomic sequences diverge, indicating that this sequence is involved in the leader-primed transcription oftgev mrna molecules. as can be seen from fig. 2 , 4 of the 6 mrna species from the fs772/70 strain of tgev have the sequence aactaaac, of which the 5'-end adenosine residue is the next base down from the divergence point. in fact, the consensus sequence at the spike/orf1-orf2 gene junction has the sequence gaactaaac and at the nuc/orf4 gene junction has the sequence cgaactaaac, indicating that the region of the leader sequence 5' to the homology motif, actaaac, may vary between 89 and 91 nucleotides depending on the tgev gene. computer analysis has also detected a homology between the leader rna sequence and the 5' end of the negative strand (i.e., the reverse complement of the noncoding region at the 3' end of the positive strand). this is shown in fig. 3 . the nucleotides on the leader rna sequence, bases 84-99, and on the negative strand, bases 136 to 152 counting from the first base after the poly(a) tail, have an overall homology of 82% and include the sequenc~ ctaaac, which is part of the postulated tgev rna polymerase-leader complex binding site. this is very similar to the observation for ibv (25) involving sequences present at the 5' end of the ibv genome, and on the ibv leader rna sequences, with the 5' end of the ibv negative strand. the homology observed included the sequence cttaac, which is part of the postulated ibv rna polymerase-leader complex binding site ct(t/g)aacaa. an oligonucleotide, oligo 58, was synthesised that was complementary to the 5' end of the tgev and prcv leader rna sequences (fig. 1) . the oligonucleotide was end-labeled and used to probe tgev and prcv mrna species that were northern blotted onto biodyne membranes. as can be seen from fig. 4 , the labeled probe hybridized to all of the tgev and prcv mrna species. the intensity of the bands corresponding to labeled probe hybridized the spike mrna species, and genomic rna was lower than that observed for the smaller mrna species due to less of these larger species being isolated from the poly(u) sepharose column used in the isolation of mrna. the fact that the probe hybridized to all of the mrna species showed that the leader rna sequence was present on the other rna molecules of tgev and both strains of prcv was not unique to the nucleoprotein mrna species. the two porcine coronavirus leader sequences were identical, indicating that the two viruses probably use the same rna polymerase-leader complex binding site, actaaac, for the synthesis of subgenomic mrna species. the seqhp comparison program of the los alamos (26) package was used to compare the leader rna sequences determined in this paper and those published for five other coronaviruses belonging to two different serogroups. the sequences were compared from the 5' ends to the point of divergence from the genomic sequences. the percentage homologies, table 1 , were expressed as the number of bases matched to the longer of the two sequences being compared. the homology of the leader sequences fell into three groups. leader rnas from coronaviruses belonging to different serological groups had homologies in the region of 35-40%. serologically related viruses like human coronavirus (hcv) (strain oc43) and mhv (strains a59 and jhm) have about 60% homology. the third group involved different strains of mhv, a59, and jhm, which showed a homology of 91%. this observation indicates that tgev and prcv, which have a homology of 100%, are probably different strains of the same virus or that prcv has very recently diverged from tgev. in order to identify common areas of homology, the leader rna sequences from seven coronaviruses were aligned. as can be seen from fig. 5 , these fell into two groups. one group consists of mhv (strains a59 and jhm) with hcv (oc43), which have a fairly high degree of homology along their lengths. the other group consists of tgev and prcv (not shown on the diagram) with hcv (229e) and ibv, which have high homologies at their 3' ends and areas of homology at their 5' ends. there are good homologies towards the 3' ends, involving the postulated rna polymerase-leader complex binding sites and sequences upstream of these sites, between the groups, but very little if any homology between the 5' ends. (7) and strain jhm (13); avian, ibv strain beaudette (9,25). as seen from fig. 5 simple alignment did not reveal very much information about the homologies of the leader rna sequences from the different coronaviruses, except at the 3' ends involving the consensus sequences. in order to identify any potential similarities in these sequences, the secondary structure of the rna sequences in fig. 5 were analyzed. potential secondary structures of the leader rna sequences were determined using the computer program fold (27) from the uwgcg dna analysis programs (28) . the coordinates determined by the fold program were displayed graphically using the uwgcg program squig-gles. the potential secondary structures obtained were compared and, as can be seen from fig. 6 , the overall shape of these sequences are very similar, except for the avian coronavirus ibv. all the molecules appear to be composed of two stem-loop structures. the two mhv molecules are very similar in shape and, as seen from fig. 5 and table 1 , are very homologous, 91%, at base sequence. the secondary structures of the coronavirus leader rna sequences are probably influenced by their biological function, which results in the similarity of these potential structures. this paper presents evidence that the nucleoprotein mrna species of tgev and the closely related porcine respiratory variant of tgev, prcv, contain an identical leader rna sequence of about 91 nucleotides. sequencing studies on tgev have shown that the heptameric sequence actaaac occurs on the genome upstream of the genes and is believed to be the binding site for the leader of the genomic rna. this mechanism has been termed leader-primed transcription and involves not only the leader rna primer, but also consensus sequences along the genome found upstream of the genes, which act as binding sites for the leader rna primer. comparison of tgev and prcv viral products has shown very little difference between the two coronaviruses, and until recently is was impossible to differentiate between the two viruses using antisera. prcv is fully neutralized by antisera prepared against tgev, and the majority of monoclonal antibodies (mabs) raised against tgev virion proteins cross-react with prcv. however, mabs, raised against antigenic determinants of the spike protein from either the virulent british isolate fs772/70 (29) or the avirulent purdue strain of tgev (30) have been identified that do not recognize prcv. these observations and the fact that the leader rna sequences from tgev and prcv are identical supports the evidence that the two viruses are very similar and that prcv may have evolved as a tgev variant. comparison of the tgev leader rna sequence with the genomic sequence upstream of the nucleoprotein indicates that the length of the unique sequence of the leader sequence is 91 nucleotides. the point of divergence is two bases upstream of the actaaac sequence, supporting the evidence that the tgev rna polymerase-leader complex binding site is actaaac. four out of the six mrna species from the fs772/70 strain of tgev have the sequence aactaaac, and the 5'-end adenosine residue is the next base down from the divergence point in the nucleoprotein mrna (fig. 2) . the differences in the homologies between the leader rna and sequences upstream of the consensus sequence on the genomic rna may play a role in the levels of transcription of a particular mrna species. the mrna species of 3.0 kb has been shown to have an open reading frame at the 5' end encoding a potential polypeptide of m r 9200 (17) . this particular mrna does not have the heptameric consensus sequence but has the hexameric ctaaac sequence, and it is interesting to note that it is the least abundant tgev mrna species (observed from tgev mrna in total cell lysates). hybridization of oligo 58 to the 3.0-kb mrna species showed that this species does contain the tgev leader rna, confirming that it is a true mrna species, even though it is the only tgev species not to have the heptameric consensus sequence. comparison of the seven coronavirus leader rna sequences against each other identified three groups (table 1) : non-serologically related viruses had about 35-40% homology; serologically related viruses had about 60% homology; viral strains had about 90-100% homology. however, tgev and hcv (229e) have been placed in the same serological group, but have only 36% homology within their leader rna sequences, suggesting that the two viruses are not particularly related. tgev and hcv (229e) have been shown to have 46% homology at the amino acid level within their derived nucleoprotein sequences (31) , whereas the homology between the derived nucleoprotein amino acid sequences for different viruses within the mhv serological group are between 80% and 98% homology. this indicates that the serological grouping of coronaviruses is not a particularly useful test, as similar epitopes may exist on the viral structural proteins. comparisons of nucleic and amino acid sequences from the viruses will provide a more accurate method for grouping the viruses. it will be interesting to compare the leader sequences of bovine coronavirus (bcv), which is serologically related to hcv (oc43) and mhv (a59 and jhm), with feline infectious peritonitis virus (fipv) and canine coronavirus (ccv), which are serologically related to tgev, once their sequences have been determined. the large variation in sequence length and content made the alignment of the different leader sequences difficult. however, alignment of the six different coronaviruses revealed that they fell into two groups. there appears to be some conservation of short sequence motifs between the seven leader sequences. toward the 3' end of the sequences, a tag motif is conserved in all the leaders, followed by a string of ts. in five out of seven of the sequences, this motif is taganntt. about ten nucleotides downstream of this region is a conserved ct motif, which is followed by a series of nucleotides differing in number, depending on the coronavirus, followed by the postulated rna polymerase-leader complex binding site. the largest number of nucleotides between the ct motif and the consensus sequence are found on tgev and prcv, the shortest is found on hcv (229e) and ibv. it is interesting to note that there is a five-base insert in mhv strain jhm when compared to mhv strain a59, which is also present in hcv (oc43) within this region. all the mammalian coronaviruses appear to have the motive ctaaac, except hcv (oc43), which has ctaaat. recent sequence data suggest that coronaviruses fipv and bcv have actaaac as their mrna consensus sequence. upstream of the tag motif there is an act motif occurring in six out of seven sequences. toward the 5' end of the leader rna sequences, the homologies are patchy and limited to short matches, occurring only between pairs of sequences. the area upstream of the consensus sequence has been suggested to be involved in the binding of nucleoprotein to the leader rna sequence at nucleotides 56-65 in mhv (32) . it was suggested that mrna species and genomic rna form a complex with the nucleoprotein by the protein binding to or near the leader sequence attached to the rna molecules (33) . secondary structure analysis of the leader rna sequences showed that all the sequences except for ibv possess a putative double stem-loop structure (fig. 6 ). in the case of the mammalian coronaviruses, the consensus sequences and upstream regions of homology are on the second stem-loop structure, leaving the possibility that the rna-dependent rna polymerase could interact with the first stem-loop structure. the ibv consensus sequence is present on the free 3' end of the single stem-loop structure, possibly leaving the single stem-loop structure to interact with the polymerase. virus infections of vertebrates (eds) coronaviruses molecular cloning: a laboratory manual we thank miss k. mawditt, of this laboratory, for synthesizing oligos 38 and 58 and dr. s. f. cartwright, central veterinary laboratory, weybridge for prcv strains 86/137004 and 86/135308. this work was supported by a research contract from the biomolecular engineering programme of the commission of the european communities, contract no. bap-0235-uk(hi). key: cord-307580-nokd5kmx authors: yang, guang; che, xibing; gofman, rose; ben-shalom, yossi; piestun, dan; gafny, ron; mawassi, munir; bar-joseph, moshe title: d-rna molecules associated with subisolates of the vt strain of citrus tristeza virus which induce different seedling-yellows reactions date: 1999 journal: virus genes doi: 10.1023/a:1008105004407 sha: doc_id: 307580 cord_uid: nokd5kmx citrus tristeza virus (ctv) strains were previously catalogued as seedling-yellows (sy) and non-sy (nsy) types, according to their yellowing and stunting effects on indicator seedlings. among subisolates of the vt strain, which were selected from chronically infected alemow plants, there was a correlation between the presence of 2.4-, 2.7and 4.5-kb d-rnas, and sy and nsy reactions, respectively. similarly, plants infected with mor-t subisolates, which cause sy, contained d-rnas of 2.6 to 2.8 kb, while nsy subisolates from recovered sour orange tissue contained a major d-rna of 5.1 kb. plants harboring the 2.7-kb d-rna were protected against challenge inoculation with a subisolate harboring the 4.5-kb d-rna. this study suggests that the nsy reaction results either from the absence of sy gene(s) in the genomes of certain ctv strains or through the suppression of the effects of sy gene(s) by d-rnas with 5′ parts larger than 4000 nt. citrus tristeza virus (ctv) (1, 2) , a member of the closterovirus group and the closteroviridae family (3±7) is an important pathogen, causing considerable economic losses to citrus industries worldwide. citrus trees infected with ctv display two main types of disease: (i) quick decline of sweet oranges (swo) (citrus sinensis l.) and of some other species grafted on the sour orange (c. aurantium) rootstock (8) ; and (ii) stem pitting of grapefruit (c. paradisi) and pummelo (c. grandis) (9) . other manifestations of infection with ctv include the seedling-yellows (sy) reaction (9±12) which is primarily a disease of experimentally inoculated plants but which might also be encountered in the ®eld in top-grafted plants. seedlings of sour orange, lemon (c. limon) and grapefruit become chlorotic and stunted when inoculated with ctv-sy isolates, but no symptoms are elicited when swo or mandarin (c. reticulata) is inoculated (1, 13) . the ctv-sy phenomenon is one of the long-standing enigmas in citrus virology. the early studies of mcclean & van der planck (9), fraser (10) and wallace (11) all suggested a complex aetiology of the ctv-sy disease. there have been reports of spontaneous recovery from sy infection by sour orange plants which initially showed sy symptoms, and of the elimination of the sy causal agent by the passage of sy-inducing ctv subisolates through sy-sensitive citrus hosts such as grapefruit and sour orange (12) , which has led to the emergence of non-sy (nsy) isolates. these phenomena have given rise to the hypothesis that the ctv-sy reaction is caused by two separate components: the ctv agent, capable of autonomous replication and responsible for the quick decline and the lime reaction; and a second component, responsible for the sy reaction and able to replicate only in plants harboring the ctv component. the ctv particles contain a single-component positive-stranded genomic rna of 19296 nt for the florida isolate, t36 (14) and of 19226 nt for the vt strain from israel (15) . the genomes of these ctv strains showed considerable sequence deviation within the 5 h half, but were found to have similar organization and to encompass 12 orfs which potentially code for at least 17 protein products. in addition to the large replicative form (rf) rna molecule, the infected plants contain a nested set of at least nine smaller species of 3 h -co-terminal single-and double-stranded subgenomic rnas (sgrnas). these sgrnas correspond to the 3 h -terminal orfs (16, 17) . cloning of the vt strain of ctv revealed the presence of several defective (d) rnas of various sizes, composed of the 5 h and 3 h termini of the genomic rna with extensive internal deletions, along with the full-length virus. the sizes of the termini varied among species, with minimal lengths of 442 nt and 858 nt from the 3 h and the 5 h termini, respectively, resulting in different sizes of d-rnas with different junction sites (18, 19) . inoculation of vt on the sour orange indicator resulted in sy symptoms (20) . later infections of sour orange seedlings by grafting with ctv-vt infected alemow budwood resulted in inconsistent sy reactions; and not all plants showed the sy symptoms. recently, we selected subisolates of two ctv strains, vt and mor-t (21) , which differed in their sy reactions on sour orange seedlings. the present paper reports the association of d-rnas with 5 h termini larger then 4000 nt, with vt and mor-t subisolates which do not elicit the sy reaction. d-rnas may be involved in the long-standing enigma of the complex etiology of the sy-ctv reaction. the vt strain was originally isolated in 1970 from a swo cv. valencia tree grafted on sour orange. the tree showed advanced quick-decline symptoms. inoculation of sour orange plants with the vt inoculum maintained in sour lime caused typical sy symptoms (20) . later passages of the vt strain from sour lime and alemow plants to sour orange often resulted in inconsistent sy reactions: not all sour orange seedlings showed the sy symptoms, even when inoculum from a single alemow plant was used to infect groups of plants from a single seed source (bar-joseph, unpublished). subisolates of ctv-vt (table 2) were randomly selected in 1994 from chronically infected alemow plants which had been graft inoculated several years earlier (1988 to 1994) with different passages of this strain. the vt subisolates were maintained in a propagation glasshouse with temperatures ranging between 15 and 35 c. the sy reaction was assayed by grafting chip buds from infected alemow stems onto sour orange seedlings grown in a temperature-controlled glasshouse facility with incandescent illumination to complete 20 h of light, and two temperature regimes (tr) of 26/18 c or 29/21 c for the normal and the semi-warm tr, respectively. in both trs the high and low temperatures were maintained for 8 and 12 h, respectively, and the adjustment from the high daytime to the low night time level and vice versa took 2 h. the sy reactions were recorded 8 and 16 weeks after inoculation, for the normal and semiwarm tr, respectively. the mor-t isolate originated from a declining minneola tangelo tree (21) . the virus was propagated in alemow and was used to inoculate a group of sour orange seedlings, some of which were inarched with the ctv-tolerant rootstock go-tou. sour orange twigs and leaves showing sy and sy recovery, respectively, were used to infect sour orange and alemow seedlings. double-stranded (ds) rnas were isolated from 5±7 g of alemow or sour orange tissues, according to dodds and bar-joseph (22) . the rnas were separated by electrophoresis in formamide-formaldehyde denaturating, 1.1% agarose gels, prepared in mops buffer, transferred to hybond n membranes. the hybridization probes consisted of a 611-bp and a 762-bp cdna fragment from the 3 h and 5 h ends of ctv-vt genome, respectively (15) . the dna probes were either non-radioactively labeled using the gene images random prime labeling module kit from amersham or radioactively labeled with 32 p according to mawassi et al. (17) . rna probes labeled with 32 32p-utp were synthesized, with the riboprobe system-t7 kit (promega) according to the manufacturer's instructions, from cdna fragments of 611 bp and 762 bp of the ctv-vt 3 h and 5 h ends, respectively, cloned in pgem (promega). antibodies for elisa capture were prepared in sheep primed with recombinant ctv coat protein (rctv-cp) antigen and boosted with a partially puri®ed ctv preparation. the second antibodies were obtained from egg yolks of chickens immunized with rctv-cp. the elisa procedure for ctv viral antigen quanti®cation in different tissues, which were soaked overnight in the antibody-coated elisa wells, was according to bar-joseph et al. (23) . the cdnas were prepared from dsrna templates of vt5 and vt12, with primers p1 and p2 for the ®rststrand synthesis, and primers p3±p4 and p5±p6 for nested and direct pcr ampli®cation ( table 1 ). the cdna fragments were separated by electrophoresis on 1% agarose gel. the bands were excised from the gel and tested with the restriction enzymes, sac i and nsi i (promega). for sequence analysis we used primers p7 and p9; p10 and p11; p10 and p8 to obtain three cdna fragments located at orf1 (1300±2486), orfs 9 10(17260±17857) and orfs 9 10 11(17260±18397), respectively. the cdna fragments were cloned into the puc 57/t (fermentas) and sequenced from both sides by using sequenase version 2 from usb. sequences of at least 150 bases were read from the 5 h and 3 h termini of each of the cdna fragments. the dsrnas from alemow plants infected with two mor-t subisolates, desig-nated #a and #b for sy-recovered and sy-reacting plants, respectively, were poly-a tailed and used for ®rst-strand cdna synthesis with primer dt14v (table 1 ) and for second-strand synthesis with primers p9 and p8, for nested pcr ampli®cation of the viral 3 h and with primers p12 and ad for the viral 5 h . the cdna fragments were separated by electrophoresis on 1% agarose gel, cloned into puc 57/t (fermentas). sequencing from both sides of the 3 h fragments, was performed by using sequenase version 2 from usb and the 5 h sequence was determined with the aid of an automatic sequencing machine. two groups of 9 month old alemow seedlings were graft inoculated at heights of 25±30 and 30 cm, with two chip buds from alemow plants infected with vt5 or vt12, respectively. two weeks post-infection (wpi), the plants were pruned and allowed to develop two side branches. tests for the presence of the speci®c d-rnas were conducted after 10 wpi. the plants where challenged, 20 wpi by top grafting with stems infected with the reciprocal subisolates. two lateral buds were allowed to sprout from each of the protected plants and leaf and stem bark tissue were tested for the presence of d-rnas by northern blotting. biological characterization of vt and mor-t subisolates hybridization with an approximately 0.7-kb cdna probe or riboprobe from the 5 h end of the vt genome with dsrna extracts from alemow plants, revealed the presence of the large rf and the low-molecularweight tristeza 5 h -corresponding rna molecules (lmt) (18) and d-rnas. vt-subisolates 6±8 and 13, and 1, 5, 9, 10, with apparently similar sy reactions, showed the presence of two types of d-rnas, of 2.4 kb and 2.7 kb, respectively. the three nsy subisolates (3,4 and 12) showed the presence of a 4.5-kb d-rna ( fig. 1a and table 2 ). the hybridization patterns of dsrnas extracted from sour orange seedlings infected with vt subisolates vt12 (nsy) and vt5 (sy) are shown in fig. 2b . only weak or no hybridization signals of genomic and/or defective rna could be located in bark and leaves from the sour orange plant which showed severe sy compared with those from the nsy plant. hybridization of dsrnas from alemow plants inoculated with mor-t subisolates #a1 (nsy) and #b1 (sy), showed the presence of major large (ca. 5.1 kb) and small (ca. 2.6 kb) d-rnas respectively (fig. 1c) . one of the sy mor-t subisolates #c1 showed only weak bands of d-rna molecules compared with the nsy subisolate #e1, which showed the major d-rna of ca. 5.1 kb (fig. 1d, lane 1) . sequence analyses revealed that sy subisolate #b1 contained two d-rnas of 2634 and 2815 nt with junctions of their 5 h termini located at positions 1772 and 1521, whereas the nsy subisolate #a1, contained a major d-rna of 5125 nt, with the junction of the 5 h terminus located at position 4376 (fig. 3b) . the hybridization with the vt 5 h probe with different vt and mor-t subisolates suggested a close relationship between their genomic rnas. in order to examine the genomic composition of the vt5 (sy) and the vt12 (nsy) subisolates, we compared the sequences of termini of their genomes by means of nested rt-pcr and sequencing analyses. primers p1 and p2 were used for ®rst-strand cdna synthesis and primers p3, p4, p5 and p6 (table 1) ampli®cation. the resulting cdna fragments for both subisolates gave the expected lengths for the 5 h (8-709) and 3 h (18611±19227) ends of their genome. restriction analysis of these products with saci and nsii gave restriction fragments of identical size (not shown). sequence analyses of internal regions, at least 150 nt in length, of three cdna fragments positioned at different regions of the vt genome ( positions 1300±2486, 17260±17857 and 17260±18397) did not reveal any sequence deviation between the products obtained from the dsrnas of the vt5 (sy) and the vt12 (nsy) subisolates (not shown). the possibility of interference between two vt subisolates, vt5 and vt12, harboring the 2.7-and the 4.5-kb d-rnas, respectively, was tested in alemow plants. the dsrnas from plants which had ®rst received a protective inoculation with either the vt5 or the vt12 subisolate and were later challenged by top grafting with the reciprocal subisolate, were hybridized with the 5 h -speci®c probe. at 18 weeks post challenge inoculation (wpci), the basal parts of each combination had predominantly the d-rnas of the protective isolate (not shown). later tests at 41 wpci showed only the 2.7-kb d-rna in the basal parts of plants protected with vt5 (fig. 2, lanes 3, 4 and 6 ). plants protected with vt12 showed the presence of either a conspicuous or a weak band of the challenging 2.7-kb d-rna in addition to the 4.5kb d-rna (fig. 2, lanes 5 and 7, respectively) . sour orange seedlings infected with alemow tissues from the interference experiments, which harbored both the 2.7-and the 4. stronger elisa titers and higher dsrnas concentrations (fig. 2b, lane 3) . biological and molecular characterization of 11 vt subisolates, which were randomly selected from chronically infected alemow plants, revealed the presence of eight sy and three nsy subisolates. the vt subisolates caused similar symptoms and comparable elisa reactions in alemow plants (not shown). the virus titers were considerably higher in sour orange plants infected with nsy than in those infected with sy subisolates. these differences were consistent among plants which were maintained under different trs (table 2 ). low virus titers or the absence of virus (indicated by negative reactions on indicator plants) in sour orange leaves and roots showing severe sy symptoms, suggest the possibility that the sy isolates emit a long-distance signal for a hypersensitive reaction. a similar situation has been previously observed in mature trees infected with ctv-mor-t, where the collapse of the sweet orange/ sour orange combination often preceded the spread and redistribution of the virus towards the upper parts of the infected trees (21) . the profound differences among the sour orange reactions to the various vt-subisolates were associated with the presence of different major d-rnas. the nsy subisolates, 3, 4 and 12, showed the presence of a major band of 4.5-kb d-rna, whereas the eight sy subisolates, 1, 5±10 and 13, showed the presence of two smaller d-rnas of 2.4 and 2.7 kb, with no apparent difference in the intensity of the sy reaction to subisolates which contained either of the smaller d-rnas. infection of sour orange with tissues from alemow plants concomitantly infected with mixtures of vt5 and vt12 resulted in reactions ranging from sy to nsy, with virus titers depending on the relative concentrations of the 2.7-and 4.5-d-rnas in the inoculum source. previously, we showed variations in the presence of the 2.4-, 2.7-and 4.5-kb d-rnas in alemow plants infected with budwood from a single vt-infected source plant (18) . differences in d-rna populations might have accounted for the previously noticed inconsistencies in the sy reaction of sour orange plants infected with vt strain (bar-joseph, unpublished). the selection of vt subisolates which show a more consistent sy reaction was correlated with the presence of a major type of d-rna (table 2) . one probable reason for obtaining apparently stable subisolates was their selection from chronically infected plants ( 4 2±3 years after inoculation) at a time when a single type of d-rna had become dominant. (fig. 3) . a 16nt sequence, 5 h -gaaaactaatttatca, with no homology to other regions of the ctv genome was found at the junction site (fig. 3) . a different short sequence, probably of host origin had previously been observed at the junction site of the 2.4-kb d-rna (19) . the ctv-sy phenomenon is one of the longstanding enigmas in citrus virology. the ®nding that both the ctv and the ctv-sy diseases could be transferred by mechanical inoculation of preparations of ctv particles (26,27) raised the question (28) of the dual-component theory of the causal agent of the ctv-sy disease (12) . dodds et al. (29) noted an association between two dsrnas of about 0.8 and 2.7 kb and swo trees infected with sy subisolates. molecular characterization associated the 0.8-kbp dsrna with the replicative subgenomic rna coding for orf11 (940 nt) (16, 17, 30, 31) and hybridization with a 3 h -speci®c probe did not reveal quantitative differences in the amounts of the 0.8kbp dsrnas from sy and nsy plants (not shown). moreover, low-molecular-weight d-rnas of 2.4 kb were located in alemow infected with nsy isolates mik-t and ach-t (32) (not shown). ctv isolates were previously classi®ed by a variety of criteria into subisolates which differed in host reactions, vector transmissibility and dsrnas patterns (29,33±37). the variability among subisolates was considered as an indication of the high frequency of mixed ctv infections. d-rnas were previously implicated in the variability between the dsrna patterns of parental isolates and their subisolates (5, 35) and the present ®ndings indicate a correlation between certain d-rnas and host reactions, and support a working hypothesis that the nsy reaction results either from the absence of sy gene(s) or through the suppression of their effects by d-rnas with 5 h parts larger than 4000 nt. the genomic and d-rna fragments of the two differentially reacting vt subisolates were found to show a complete sequence identity. nevertheless, the possibility that a minimal sequence deviation between other parts of their genomes is involved in these biological differences cannot at the present be completely ruled out. moreover, the question of the mechanism that causes sy symptoms in sour orange tissues, which contain only low concentrations of viruses or d-rna remains to be answered. d-rnas have been isolated from a broad spectrum of animal viruses and, more recently, also from a large number of plant viruses (for recent reviews, see (38) ). different d-rnas have previously been reported to have different effects on disease expression: while d-rnas of tombusviruses had attenuating effects on infection (39, 40) , the d-rnas associated with the turnip crinkle virus tended to increase the severity of symptoms (41) and the d-rnas associated with broad bean mottle virus had no effect on some host plants but intensi®ed the severity of symptoms in others (42) . the correlation between the sy reactions of sour orange seedlings and the genomic composition of the d-rnas in the alemow inoculum, support the notion that the host type is a major determinant of the biological effects of d-rnas (43) . citrus tristeza virus, revised description. cmi/aab description of plant viruses filamentous viruses of woody plants pathogenesis and host-speci®city in plant diseases encyclopedia of virology agricultural gazette indexing procedures for 15 virus disease of citrus proc. 5 th con. iocv. iocv. gainsville plant diseases of international importance, diseases of fruit crops proc. 9 th con. iocv. iocv, riverside pro. 13 th con. iocv. iocv, riverside proc. 12 th con. iocv. iocv, riverside pro. 13 th con. iocv. iocv sem virol 7 key: cord-302584-fwdpzv85 authors: zhu, ying; liu, mo; zhao, weiguang; zhang, jianlin; zhang, xue; wang, ke; gu, chunfang; wu, kailang; li, yan; zheng, congyi; xiao, gengfu; yan, huimin; zhang, jiamin; guo, deyin; tien, po; wu, jianguo title: isolation of virus from a sars patient and genome-wide analysis of genetic mutations related to pathogenesis and epidemiology from 47 sars-cov isolates date: 2005-01-01 journal: virus genes doi: 10.1007/s11262-004-4586-9 sha: doc_id: 302584 cord_uid: fwdpzv85 severe acute respiratory syndrome (sars) caused by sars-associated coronavirus (sars-cov) is a fatal disease. prevention of future outbreaks is essential and requires understanding pathogenesis and evolution of the virus. we have isolated a sars-cov in china and analyzed 47 sars-cov genomes with the aims to reveal the evolution trends of the virus and provide insights into understanding pathogenesis and sars epidemic. specimen from a sars patient was inoculated into cell culture. the presence of sars-cov was determined by rt-pcr and confirmed by electron microscopy. virus was isolated followed by the determination of its genome sequences, which were then analyzed by comparing with other 46 sars-cov genomes. genetic mutations with potential implications to pathogenesis and the epidemic were characterized. this viral genome consists of 29,728 nucleotides with overall organization in agreement with that of published isolates. a total of 348 positions were mutated on 47 viral genomes. among them 22 had mutations in more than three genomes. hot spots of nucleotide variations and unique trends of mutations were identified on the viral genomes. mutation rates were different from gene to gene and were correlated well with periodical or geographic characteristics of the epidemic. in november 2002, first case of a novel infectious disease named severe acute respiratory syndrome (sars) suddenly appeared in southern china [1] . this illness emerged and rapidly spread to different areas of asia and then other countries around the world with a high morbidity (about 25% required intensive care) and 9.6% fatality [2] . in march 2003, the world health organization (who) made an unprecedented international effort by organizing world-leading laboratories to find the causative agent. this effort resulted in the declaration made simultaneously by three research groups that a new sars-associated coronavirus (sars-cov) was the pathogen of this disease [3] [4] [5] . when the outbreak of sars came to an end in july 2003, it had caused a cumulative total of 8437 cases and 813 deaths worldwide [6] . since the discovery of sars-cov, progresses regarding the studies of this virus have been swift dramatically as the complete viral genome was sequenced [7] . although the definition of sars case still largely relied on clinical and epidemiological criteria, diagnostic tests based on the detection of viral rna and proteins have been developed [8] , along with the development of vaccines [9] . results from both phylogenetic analysis and epidemiological studies suggested the origin of sars-cov was animal-oriented, most likely from himalayan palm civets, ferrets and raccoon dogs [10] [11] [12] [13] . as a member of the coronoviridae family, sars-cov is enveloped and positive-stranded rna virus. it harbors 23 coding sequences, including 4 primary structural proteins (nucleocapsid protein n, spike protein s, membrane protein m, and small envelope protein e); 5 non-structural proteins (x1, x2, x3, x4, x5); and 1 polyprotein that compose two orfs (orf1a and orf1b). polyprotein catalytically auto-processes to produce a group of proteins including proteases (plppro and 3clpro), rnadependent polymerase (pol), rna helicase (hel), and function unknown proteins [4, 5, 7] . like other rna viruses, whose most striking characteristic is the high rate of genetic mutation [11, [14] [15] [16] [17] [18] . despite the fact that the sars-cov can cause an atypical and fatal form of pneumonia, the genome structure, gene expression pattern, and protein profiles of the virus are similar to those of other conventional coronaviruses [17] , which are only responsible for mild respiratory tract infections in a wide range of animals including humans, pigs, cows, mice, cats, and birds [10, 19] . it is possible that distinct patterns of several genes and unique variations in the sars-cov genome may contribute to its severe virulence or pathogenesis. the mechanism of sars-cov pathogenesis may involve both direct viral cytocidal effects on the target cells and immunemediated mechanisms. potential mutability of the viral genome may pose problems in the control of future sars epidemics. in this report, we described the isolation of a new sars-cov strain (whu) from a patient in hubei province, china during the late period of sars outbreak. complete genome sequence of whu isolate was determined and compared with that of 46 other sars-cov strains whose complete genomic sequences were available at the time analyzed. comparative study of genetic characterization and nucleotide variation of all known sars-cov offers insights into understanding functions of the viral genes and revealing the evolution trends of the virus. it would also provide basis for clinical diagnosis, future developing potential drugs and vaccines against sars-cov infections. the sars patient was an 18-year-old male from jiayu county, hubei province, china. he worked in beijing during that time when sars outbreak was occurring. he came back to hubei province and became ill on april 29th, 2003 with fever and atypical pneumonia, and was admitted to hospital for isolation and treatment on may 3rd 2003. veroe6 cells were inoculated with specimen obtained from the sars patient. the presence of the sars-associated coronavirus in infected cell cultures was determined by the appearance of cytopathic effects (cpe) as well as by rt-pcr amplification using primers (primer-1/primer-2 and primer-3/primer-4; table 1 ) specific to the sars-cov. viral particles were examined under electron microscope. viral rna was extracted from infected veroe6 cells based on the procedures described by the manufacture (invitrogen, carlsbad, ca). the first strand of the viral cdna was synthesized from extracted viral rna by reverse transcription pcr using random primers provided by the manufacture (promega, madison, wi). double-stranded dna fragments were produced by pcr amplification of the viral cdna using 10 pairs of specific primers (primer 5 to primer 24; table 1 ) designed to cover entire viral genome based on the sequences of sars-cov strain hku-39849 (accession number ay278491). each of the pcr products was cloned into vector pgem-t, respectively. random clones were selected for dna sequencing analysis. sequences representing the entire viral genome was fully assembled and edited by dnasis software programs. nucleotide sequences of complete genome of the sars-cov isolate (whu) were deposited to genbank (accession number ay394850). the complete genome sequences of all 47 sars-associated coronaviruses were downloaded from genbank (table 2 ). homology searches for the dna sequences were conducted and their deduced amino acid sequences were analyzed through the public database with the blast search program provided by the national center for biotechnology information (ncbi). sequence alignment was performed using software clustalw and further analyzed using software bioedit. nucleotide sequences of the entire genome of newly identified whu strain along with that of other 46 sars-cov isolates released in the genbank were aligned with the clustalw software program. phylogenetic trees were created for all nucleotide sequences by neighbor-joining and parsimony methods. sequences were analyzed with reference to the trees to reveal character states relevant to phylogenetic branching. during late period of the sars outbreak in 2003, three patients were identified as probable sars cases in hubei province, a less sars representative area in china. in order to study the sars-cov caused disease, we obtained specimen from one of the patients. seven days after inoculation of veroe6 cells with patient specimens, cpe was appeared on the infected cells ( fig. 1) indicating the presence of an infectious agent. two specific amplicons were detected by rt-pcr amplifications using extracted viral rna as templates when two pairs of sars-cov specific primers were used, respectively (data not shown). these results implicated that exist of a sars-cov in the specimen was highly possible. coronavirustable 2 . accession numbers of genomic sequences of 47 sarsassociated coronaviruses released in the genbank accession number accession number urbani ay278741 twy ap008581 tws ap006560 twk ap006559 twj ap006558 twh ap006557 cuhk-w1 ay278554 taiwan tc3 ay348314 taiwan tc2 ay338175 taiwan tc1 ay338174 twc ay321118 frankfurt ay291315 bj04 ay279354 bj03 ay278490 bj02 ay278487 zj01 ay297028 tor2 ay274119 tw1 ay291451 bjo1 ay278488 shangai qxc1 ay463059 shangai qxc2 ay463060 like particles were observed when we further examined infected cells under electron microscope (data not shown). in addition, sars-cov antibodies were detected from the patient's serum. all together, these results provided substantial evidence to suggest that this patient was infected by sars-cov, named whu strain. after identification of the whu strain, we isolated the virus and determined complete nucleotide sequences of its genome (accession numberay394850). since this virus was the only sars-cov that has ever been isolated and sequenced from hubei province, we carried out detailed sequence analysis of its entire genome. results from sequence analysis indicated that the genome of whu strain consisted of 29,728 nucleotides with a two-nucleotide deletion at residuals 27,825 and 27,826. phylogenetic analysis was conducted with the genome sequences of the whu strain and that of all 46 sars-cov isolates, whose genomic sequence information was fully available in the public databases (table 2 ). both phylogenetic study and sequence analysis indicated that the overall genome organization and predicted proteins of whu isolate were in agreement with published studies on other sars-cov isolates (fig. 2) . like all sars-cov isolates, the whu strain belongs to a new group of coronavirus [3] . however, the whu isolate with a two-nucleotide deletion was genetically diverse from most of the published sars-cov isolates, but closely related to twc strain (fig. 3) . to investigate the variations of nucleotide sequences among sars coronaviruses, we performed a genome-wide analysis of genetic mutations on all 47 sars-cov genomes. results indicated that a total of 348 positions on the 47 viral genomes had alterative nucleotides. among them, 22 positions with mutations occurred on more than three viral genomes ( table 3 , fig. 4 (table 3 and fig. 4) . our next step was to determine whether the high mutability had any implications linked to the viral genes or their functions ( fig. 4 and table 3 ). after further comparison and analysis of the viral sequences, we realized that polyprotein gene (orf1 a and orf1 b) had the highest variation rate among all genes. this region not only carried 11 mutations, but also had the second highest variable positions (residual 3852 and 11,493). orf1b gene contains additional two residuals (17,564 and 19 ,084) at which 7 viruses were mutated. we also noticed that the s gene had a high mutability with residual 22222 mutated in 7 viruses, residual 21721 in 6, and residual 24933 in 3. two positions with high mutation rate were identified within the m gene. one was located at the most variable residual 26477, at which 20 viruses were mutated. the other one was residual 26600, at which 6 viral genomes were changed. e gene and n gene had one mutation spot at residual 26203 and 28276, respectively. among five nonstructural genes, x4 had one mutation site at residual 27243 with mutation rate of 5, while x5 gene had two mutation spots at residual 27813 and 27827 with mutation rate of 7 ( fig. 4 and table 3 ). based on the recommendations from who [6] , all sars cases can be divided periodically into early-period case, mid-period case, and late-period case (table 4 ). in this study, we proposed all 47 known viral isolates into two groups, early-mid period and mid-late period group (table 5) . based on results from sequence analysis, we realized that there were some correlations between genetic mutations of the virus and periodical or geographic characteristics of the outbreak. several residuals (9404, 9854, 17564, 19838 (tables 3 5) . in addition, some genetic mutations were linked to certain geographic regions where the viruses isolated. for instance, high genetic mutation rate at position 3852 was mainly found in viruses isolated from taiwan. mutations at residual 26203 occurred in most taiwan isolates (60%), but not found in any isolates identified from other regions around the world. moreover, all three viral strains (fra, sod and frankfurt) isolated from europe had mutations at the same residuals, 2557, 11448 and 24933, while the rest isolates showed no changes in these positions (tables 3 and 5 ). although the sars epidemic ended after 6 months spreading, many important questions remain unclear. what is the natural reservoir of sars-cov; where and how the virus crossed the barriers between its reservoir and human to initiate reservoir-human transmission, and subsequent human-to-human infection. it was proposed that the natural reservoir of sars-cov was animal originated [10, 11, 13] , most likely himalayan palm civets [12] . this was not a surprise, since many fatal human viruses including hiv and influenza virus were originated by transmission from animals. hiv pandemic had happened as a consequence of the combination of transmission of sivcpz from chimpanzee and common practice of ''hunting and field-dressing chimpanzee'' in west central africa [20] . similarly in southern china, where sars-cov initially emerged, people used to consume wild animal meat and some of the animals are now confirmed to carry sars-like coronavirus [12] . another question is whether sars outbreak will come back. at the beginning of 2004, three sars cases were reported indicating sars do come back. however, the situation of this year seems quite different from last year, since transmission, infection and severity of sars-cov were clearly weakened. one possible explanation is that it might be just a preface of sars epidemics. like last year, in the early period of sars pandemics, the virus did not show strong toxicity. another possibility is that sars-cov might be truly weakened due to many reasons including genetic mutations, like the influenza flua virus which has caused a disaster outbreak in 1918 and was weakened after the pandemic that took 20 million lives [21] . influenza epidemics throughout the world occurred periodically between the first pandemic and present time due to the viral antigenic drift and shift. these processes also resulted in the appearance of influenza b and c virus with significant differences in genetic characterizations [22] . it would be important to find out if sars-cov has similar epidemic rules as influenza virus dose, whether sars-cov is weakening or will sars breakout periodically. while these questions remain to be addressed, it is for sure that the sars-cov certainly has a high mutation rate on table 3 . summary of genetic mutations within genes of 47 sars-associated coronaviruses orf 1a position 2557 3852 9404 9854 11448 11493 mutation rate 3 14 7 6 3 14 its genome, which could in turn play significant roles in its pathogenecity and epidemics of the disease. molecular epidemiology and genome-wide analysis of mutations among sars-cov have provided insights into our understanding some of the questions [11, [14] [15] [16] [17] [18] . for instance, except the geographic distribution of potential animal reservoirs, the high homologies among sars-cov of human and sars-like coronavirus of animals strongly supported the hypothesis of animal origin of sars-cov [12] . it is possible that some mutations on the viral genome were responsible for the transmission of sars-cov from animals to human. in an effort to study the sars-cov, we identified and genetically sequenced a new sars-cov isolated from a patient with sars in hubei province. hubei was a less sars representative area in china, because there were only a total of three patients confirmed as probable sars cases and only one viral strain was isolated from this region. these facts prompted us to study this virus further. our sequence analysis indicated that although the overall genome organization of whu (fig. 2 ) is in agreement with published studies on other isolates, whu carried a two-nucleotide deletion at residuals 27825 and 27826 was genetically diverse from most sars-cov isolates. these results implicated that mutations occurred during the viral transmission from beijing to hubei, although we do not know at this point whether these mutations have any biological significance. it is interesting to notice that although the sars-cov virus evaded human population only for 6 months, its genetic information already altered in many ways during its short journey of human transmission. individual viral genes displayed distinct patterns of genetic mutations at different time during the sars outbreak. for instance, mutability of the s gene was high during early-mid period, but low during mid-late period of the epidemic, which suggested that mutability of s gene decreased as viral transmission increased. one possible explanation for this observation is that during early-mid period of the epidemic, as the gene encoding protein for the recognition of receptors of the host and for the mediation of viral entry into host cells, s gene had to change at a high frequency in order to quickly fulfill its biological roles. once the viral adaptation to human cells completed or reached its equilibrium, genetic changes were less important or no longer needed. thus, genetic information of s gene became relatively stable during mid-late period of the outbreak [23] . another example is orf lab that encodes the polyprotein of sars-cov. like s gene, orf lab was also actively involved in genetic mutations. however, in contrast to s gene, mutability of orf lab was low at the beginning, but high during midlate period of the epidemics. this observation can be explained well by the fact that the toxicity of sars-cov was weakened in mid-late period. other structural genes including e, m, and n genes were more conserved at beginning of the outbreak, but underwent genetic changes at the end of transmission. this pattern of genetic mutation obviously reflects biological roles of these structural genes in viral particles assembly, which in turn crucial for the virus to fight with increasing immune pressures from the hosts. genetic analysis of non-structural genes showed that they intended to keep genetic information conserved throughout the entire process of transmission. therefore, these genes may prove to be ideal targets for the diagnosis of sars-co.v, screening antiviral drugs, and perhaps developing antiviral vaccines. patterns of genetic mutations of certain viral genes were linked to geographic locations from where the virus isolated. mutations at residuals 3825 and 26203 within the x5 and e genes could clearly set the taiwan isolates apart from others. thus, these two positions may be used as molecular signatures in the identification of taiwan isolates. similar phenomena were also found in three viral strains (sod, fra, and frankfurt) isolated from europe during mid-late period of the outbreak. these viral strains had mutations at the same residuals (2557, 11,448 and 24,933), while all isolates from other regions did not show any changes at these positions. this kind of specific mutation pattern may reflect relatively independent geographical locations of taiwan and europe. we speculated that population in these regions perhaps developed unique immunity due to their unique locations, for which the virus had to make specific genetic mutations in order to invade these populations. in addition, based on genome-wide mutation analysis, some viral strains isolated from beijing had a close relationship to isolates identified from southern china during early-mid period of the outbreak. it could be translated to that at least these sars-cov isolates found in beijing were originally from southern china. much have to be done in order to understand thoroughly the evolution, transmission, origin, and infection of sars-associated coronavirus. it is interesting to recognize that genome-wide mutation analysis could provide new insights into our understanding the route of viral transmission and predication or perhaps prevention of future sars epidemics. our study would provide a rational and hypothesis-driven approach to study these questions, develop rapid diagnostic tests, and design measurement to prevent this fatal disease. in addition, fully understand molecular mechanism of genetic mutations would provide insights into understanding plausible transmission route of sars-cov from animal to humans as well as from human to human, and trends of changing in pathogenecity of sars-cov during its rout of transmission and path of evolution. cumulative number of reported probable cases of severe acute respiratory syndrome (sars) department of communicable disease surveillance and response. who consensus document on the epidemiology of severe acute respiratory syndrome (sars) this research was supported by the sars special grant of wuhan university. key: cord-005252-m02inmc4 authors: kwon, hyuk moo; jackwood, mark w. title: molecular cloning and sequence comparison of the s1 glycoprotein of the gray and jmk strains of avian infectious bronchitis virus date: 1995 journal: virus genes doi: 10.1007/bf01702878 sha: doc_id: 5252 cord_uid: m02inmc4 the nucleotide sequences of s1 glycoprotein genes of the gray and jmk strains of avian infectious bronchitis virus (ibv) were determined and compared with published sequences for ibv. the ibv gray and jmk strains had 99% nucleotide sequence similarity. the overall nucleotide sequence similarity of the gray and jmk strains compared with other ibv strains was between 82.0% and 87.4%. the similarity of the predicted amino acid sequence for the s1 glycoproteins of the gray and jmk strains was 98.8%. six of the 10 differences in the amino acid sequence were found between residues 99 and 127, suggesting a possible role for that region in the tissue trophisms of the viruses. the s1 glycoprotein of the gray and jmk strains had 79.5%–84.6% amino acid similarity with the published sequence of other ibv strains. serine instead of phenylalanine was observed in the protease cleavage site between the s1 and s2 glycoprotein subunits for the gray and jmk strains, which was similar to the published sequence for the ark99 and se17 strains. the significance of that amino acid change is not known. based on the nucleotide sequence of the gray and jmk strains, thebsmai restriction enzyme was selected by computer analysis and was used in restriction fragment length polymorphism analysis to differentiate the two strains. avian infectious bronchitis virus (ibv) causes an acute, highly contagious disease of the respiratory and sometimes the urogenital tracts of chickens. infectious bronchitis (ib) is an economically important disease to the poultry industry, and outbreaks continue to occur because ~present address: ohio agricultural research and development center, fahrp, the ohio state university, wooster, oh 44691, usa the nucleotide sequence data reported in this paper have been submitted to the genbank nucleotide sequence database and have been assigned the accession numbers grays1 = l14069 and jmks1 = l14070. different ibv serotypes do not completely crossprotect (1). the virus is the type species of the family coronaviridae, and its genome consists of one molecule of positive sense single-stranded rna (2) . it has three major structural proteins: a nucleocapsid protein, an integral membrane glycoprotein, and a spike (s) glycoprotein (3, 4) . the s glycoprotein is cleaved into n-terminal s 1 and c-terminal $2 subunits (5, 6) . the s1 glycoprotein forms the distal, bulbous part of the s glycoprotein, and the $2 glycoprotein anchors the s glycoprotein to the membrane of the virion (7, 8) . neutralizing, hemagglutination-inhibiting, and serotype-specific antibodies are directed against the s1 glycoprotein (9) (10) (11) (12) . tissue tropism has also been associated with the s1 glycoprotein (13) . the s glycoprotein gene of several serotypes of ibv has been sequenced to investigate the antigenic variation of ibv at the molecular level (14) (15) (16) (17) (18) . an amino acid sequence comparison of the massachusetts 41 (mass41) vaccine strain and the beaudette laboratory strain revealed that s1 had two hypervariable regions (hvrs) (17) . antigenic and serotypic determinants of ibv are thought to be located in the hvrs (3, 16, 19) . recently we reported on a polymerase chain reaction/restriction fragment length polymorphism (pcr/rflp) procedure to distinguish between serotypes of ibv (20) . in that procedure three restriction enzymes (re) were used to distinguish all of the known serotypes within the united states, as well as variant viruses. only the gray and jmk strains could not be differentiated from each other. in an attempt to distinguish between the gray and jmk strains, over 23 re were tested unsuccessfully. serology indicates that the gray and jmk strains are closely related and belong to the jmk serotype (21) . the gray strain, however, is nephropathogenic (22, 23) , whereas nephrotropism has not been reported for the jmk strain. the objectives of the present study were to clone and sequence the s1 glycoprotein gene of the gray and jmk strains of ibv in order to identify an re that would differentiate the two strains in the pcr/rflp serotype identification test. it is important to differentiate the two strains in a diagnostic test because the gray strain is nephropathogenic. in addition, it is useful to know the sequence of serologically similar viruses that have differences in their tissue tropism. with that information we can begin to identify regions in the viral genome that may be associated with pathogenicity, dr. jack gelb, jr. (university of delaware, newark, de) provided one gray strain (22) chicken embryo passage 10 and two (received at different times) jmk strains (23), chicken embryo passage number i1. another gray strain (22) , chicken embryo passage 9, was obtained from dr. pedro villegas (university of georgia, athens, ga). all were passaged once in embryonating chicken eggs. the viral rna was extracted and purified as previously described (20) . briefly, sodium dodecylsulfate (final concentration, 2% wt/vol) and proteinase k (final concentration, 250 ~g/ml) were added to allantoic fluid, incubated for 5 min at 55°c, and extracted with acid phenol and chloroform/isoamyl alcohol. the rna solution was further purified using the rnaid tm kit (bio i01) according to the manufacturer's recommendation, then stored at -70°c until used in the reverse transcriptase (rt) reaction. the s ioligo5' and s 1oligoy primers for the rt reaction and pcr, synthesized by the university of georgia molecular genetics facility, have been described previously (20) . the sequence of the primers and their relative position in relationship to the s1 glycoprotein gene are shown in fig. 1 . all of the reagents for the rt reaction and pcr have been described previously (20) . reverse transcripiton of rna purified from allantoic fluid was done with moloney murine leukemia virus reverse transcriptase (gibcobrl) and primer s1oligo3', which is complementary to a region at the 5' end of the $2 glycoprotein gene. for the pcr reaction, the primer s1oligo5', which is identical to a sequence near the 5' end of the s1 glycoprotein gene, and 5 units of ampli-taq dna polymerase (perkin-elmer cetus) were added to the rt reaction. for 35 cycles at 94°c for 1 rain, 45°c for 2 min, and 74°c for 5 min, pcr was performed in a twinblock tm thermal cycler (ericomp). the pcr products were electrophoresed (100 v constant voltage) on a 1% agrose gel containing ethidium bromide (0.5 ~g/ml). cdna cloning the s 1 band, with a predicted size of approximately 1.7 kbp, was cut from an agarose gel and purified using the geneclean kit (bio 101) according to the manufacturer's recommendations. the purified dna was tigated into the pcr tm ii (invitrogen corp.) cloning vector, then transformed into competent escherichia coli cells (1nv~f', lnvitrogen). the white colonies carrying recombinant plasmids were selected from luria-bertani (lb) agar (24) plates containing kanamycin (50 ~g/ml) and 25 p~l of 40 mg/ml x-gal stock solution. the alkaline lysis method was used for small preparations (mini-preps) of plasmid dna. the purified ptasmid dnas were digested with ecori (promega) and analyzed on a 1% agrose gel to determine the size of the insert. cesium chloride density gradient centrifugation was used to obtain larger amounts of plasmid dna for sequencing. denatured double-stranded cloned dna was sequenced by the dideoxy chain termination procedure using the sequenase version 2.0 kit (usb) as recommended by the manufacturer. initially, the m13 forward (usb) and reverse (#1201) primers were used for sequencing. in addition, six other primers were synthesized to various regions within the gray strain of ibv ( fig. i) . at least three clones of each strain were sequenced. nucleotide sequence data were compiled and analyzed on a ibm personal computer using the pc/gene software (intelligenetics, inc.). the s1 pcr products of the ibv gray and jmk strains were purified on an agrose gel as previously described (20) and were digested with bsmai (neb, beverly, ma) according to the manufacturer's recommendations. the restriction fragment patterns were observed following electrophoresis (100 v constant voltage) on a 2% agrose gel containing 0.5 ~g/ml ethidium bromide. the nucleotide sequence of the entire s 1 portion of the s glycoprotein gene, including the signal sequence for the gray and jmk strains, is shown in fig, 2 . a comparison of the amino acid sequences deduced from the nucleotide sequences c a g c a g a a c sei * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ppl * * * * * * ct fig. 2 (continue~. fig. 2 (continued) . fig. 2 (continued) . a c 150 j m k b e a g t t g t t a g c g a g c t m 4 1 g t t t g c t a g c g a g c t a 9 9 t t a a t a g c c a 7' t s e i t t t a a t a g c t ppi t t g c t a g c a c c t g r a t a c t a c c a a a g c g c c t t c a g a c c a c c t a a t g g t t g g c a t t t a c a t g g a g g -2 0 0 j m k c b e a t g a g m 4 1 t t c g a 9 9 g t t t g g g a c sei g t t t g g g a c c ppi t t t g g g a c g r a g g a c c t g g c a 250 j m k b e a t g c g c a a g c t g g c t c t t m 4 1 t g c g a a g c c g g c t c t t a 9 9 t g c g a g t a g g a c t g s e i c ct g g c a t g ppi t g c g g g c t g g a c t g 300 j m k c b e a c a a g a c t t a t t c a g t g t c g t g t g a m 4 1 c a c g a t t a t c a g tg t c g t g t g a a 9 9 c c c a a c a c t t g g c a c t c g s e i c a c a g a c g t t c g g c a c t c ppi c c c a a a a t t g g c a c t c g g r a g c t t c t g t a g c c a t g a t t g c a c c a c a t a g t g g t a t g t c g t g g t c t g t c c a -3 5 0 j m k b e a t a t c g g t c a t c a g t a g a g m 4 1 t a t cg g t c a t c a g t a g a g a 9 9 c a c t a a c t c sei t g g c t a a c t a ppi c a c c a a c t t a 4 0 0 j m k b e a t g t t a t a g a c m 4 1 t g t t a c t a g a c a 9 9 a t c t t a t c a t sei a a t c ct ppl a t c c t a a c t c a t j m k c t a b e a a a c a t t g g t c t a a g g c t g c t a a a g m 4 1 a a t a t a t g g t c a t a a g g c t g c b-ab_ag a 9 9 a g c t a a a g g t t a sei a g c t t c a a a g t t t a p p i a g c c a a a c g t t a g r a t t a c a a g g c c a a a t c c g c a t t t c t g c t a t g a g a a g c g t t a a t a g t c g t c c -5 0 0 j m k c c a t b e a a a t t t a t a g t g t t c t . . . . . . . . . . . . . . . . c t g a a a gg m 4 1 a a t t t t t t a g t g t t c t . . . . . . . . . . . . . . . . . . c t g a~_ha g g a 9 9 cc a g c t t t t t g a c a t g a g c a c g s e i c a t c a t t c t t g g a t g a a a c g ppi cc a a c t t t t t g a c a t g a g g t a a g 5 5 0 j m k --b e a c g --c a g a g g c a c t m 4 1 c g --c a g a g g c a c t a 9 9 g g a c t a c a a g sei t c t g t a a t g t a t a c ppi g g t a c t a c a a g a t -6 0 0 j m k b e a g a t t g t t a c m 4 1 a t t g t t a c a 9 9 g a g a t t c sei g a t t t a g c c ppi g a g a t t c g a 650 j m k b e a t a c a c c a t c t m 4 1 t a c a c c c a t c t a 9 9 t c a ct t g a g t c c sei t a c c a g t t ppi t ca c ct t g a g t c c a c c~a g -7 0 0 j m k b e a t ga g t t m41 t g a g t t a 9 9 a g t g a g g t t sei c ppi a g t g g a g g t t g 750 j m k b e a c t g m41 c t g a 9 9 ct c t c a sei t ppi ct c t c a 0 0 j m k b e a a c c t c t m41 a c c t c t a 99 a c a c a t c t sei t t ppi a c a c a t c t g j m k c b e a c t a g t a c g a m41 c t t a g t a c g a a 9 9 c a c g a sei c t a ppi c a c g a 9 0 0 j m k b e a c a t gt g t c a c m 4 1 c a t t g t c a c a 9 9 g t s e i a g t a ppi g t t g a c -9 5 0 j m k b e a t t a g c g c a a c c c t a t c g a m 4 1 t a g c g c a a c c c t a tc g a a 9 9 a g t c ca t g s e i t t cc a t c t g t t t ppi a gt c c a t g g 1 0 0 0 j m k b e a a c a a a c t c a a a g t m 4 1 a c a a a c t c a a g t a 9 9 t c a g t s e i g t a g t ppi t c a g t 1 0 5 0 j m k t b e a c a g t t a g a m 4 1 c a g t t a g a a 9 9 a g t t g a a g a a sei c a g t t a g c ppi a g t t g a a g a a g ii00 j m k t b e a t a g a a t a t a m 4 1 t a g a t a t a a 9 9 t g g c t a -s e i t t a t ppi t g t a i i 5 0 j m k b e a c g m 4 1 c g a 9 9 c c t t a t a c a sei c a t c a a ca p p i t t a t a a a j m k b e a aa m41 a 9 9 t a a a s e i t ppi a a a a a a j m k t b e a t g m 4 1 t g a 9 9 t cg a c g t g c t c a g c s e i c t ppi t cg t c g t g g t c a g c g a t g 1 3 0 0 j m k b e a g a a g m 4 1 g a t a 9 9 a c a a g c t t sei g a ppi a c a a g t t t a ] 350 j m k b e a c a m 4 1 c g c a 9 9 c t a a c c a t a t c a t t t se] c g ct c pp2 c t a a c c a t a t c a t t t g a g -! 4 0 0 j m k b e a m41 a 9 9 c c g g a g g t t sei g pp! c g g a g t t g 1450 j m k t c b e a g c m 4 1 c a 9 9 a t t a c t c c c sei t ppi a t t a c tc c a 1500 j m k b e a c g t m41 c g t a 9 9 c g g g g a a sei g ppi g g t a a g 1550 j m k b e a t t t m 4 1 t t c t a 9 9 c c sei t c c ppi c c c c 1600 j m k c b e a c t c g m 4 1 c t c g a 9 9 t t sei t ppi t t g a c 1650 j m k g b e a t g t g m41 t g t g a 9 9 t g c a se] t g a ppi t t g c a -1 7 0 0 j m k b e a c c a t a m 4 1 c a t a a 9 9 c g a a sei c g t a a a pp2 g t of the gray and jmk strains is shown in fig. 3 . also included in figs. 2 and 3 is a comparison with published sequences (14, 17, 18) . the ibv gray and jmk strains had similar s1 sequences. the gray and jmk strains differed by only 1% (17/1738) in their nucleotide sequences. the gray and jmk strains had between 82.0% and 87.4% nucleotide identity with the mass41, beaudette, ark99, se17, and pp14 strains. the gray and ark99 strains had the least similarity, and the gray and se 17 strains had the most. the gray and jmk strains had 18 extra nucleotides at a position 469-486 (fig. 2 ) that were not found in the nucleotide sequences of the mass41 and beaudette strains. the gray and jmk strains differed by 1.2% (10/557) in their amino acid sequences. most of the differences in the amino acid sequence were found between residues 60 and 127. a highly variable region containing six differences was observed between residues 99 and 127. the gray and jmk strains had between 79.5% and 84.6% amino acid identity with the mass41, beaudette, ark99, se17, and pp14 strains of ibv. a dendrogram of the amino acid alignment is presented (fig. 4) . the gray and jmk strains had the least similarity to mass41, and the most similarity to the sei7 strain. like ark99 and se17, the gray and jmk strains had a serine (residue 523) instead of phenylalanine in the cleavage site of the connecting peptide between the s 1 and $2 glycoproteins (fig. 3) . based on a computer re analysis of the nucleotide sequence for the gray and jmk strains, the bsmai re was selected for use in the rflp analysis of the two strains. following digestion of the pcr product with bsmai and electrophoresis, the gray and jmk strains had the expected restriction fragment patterns (fig. 5) , which could be used to differentiate between them. the purpose of sequencing the s i glycoprotein genes of the gray and jmk strains of ibv was twofold. first, we wanted to identify a re for use in our pcr/rflp serotype identification test that would distinguish between those viruses. second, we wanted to add the sequence of those strains to the growing database of s1 glycoprorein sequences for strains of ibv in the united states. those data are a first step toward identi, and ppi4 (ppi) (29) s1 genes. asterisks indicate unavailable sequences. to con~rm to other published sequences ~r s1, numbering begins after the signal sequence (bold~ce). dashes w e~ introduced to align the sequences. the double-underlined sequence is a connecting peptide of the spike precursor polypeptide. fying neutralizing and serotype-specific epitopes, and regions that are involved in attachment of the virus to target cells. the s 1 glycoprotein sequences of gray and jmk presented here are the first published sequences for this serogroup (designated jmk). by computer search and agarose gel electrophoresis, the bsmai was found to be the best enzyme for distinguishing between the gray and jmk strains in our pcr/rflp serotype identification test. three restriction sites were observed in the jmk strain at bases 445 (within hvr2), 613, and 1078; the gray strain had two sites at bases 613 and 1078. ten differences in the amino acid sequences of the s i glycoprotein were observed between the gray and jmk strains. beaudette and mass41 (both massachusetts serotypes) are reported to have 26 differences in their amino acid sequences (15) . six of the 10 differences between the amino acid sequences of the gray and jmk viruses were in a variable region between residues 99 and t27. this corresponds to a variable region with the massachusetts serotype reported by niesters et al. (17) between residues 117 and 131. the overall differences in the amino acid sequences observed between all of the ibv strains examined herein were located between residues 34 and 138 and 234 and 324. similarly variable regions between residues 40 and 129 and 271 and 378 have been reported by cavanagh et ah (19) for closely related serotypes of ibv. our data extend this observation to include different serotypes of ibv, suggesting (as others have) that these regions may be involved in forming serotype-specific and virus-neutralizing epitopes. a protease cleavage site between the s1 and $2 glycoprotein subunits was reported to be arg-arg-phe-arg-arg for the beaudette and mass41 viruses (5, 13) . the cleavage site of the gray and jmk strains was similar to the recently published sequence for ark99 and sei7 (18) , wherein a serine instead of a phenylalanine (residue 523) was observed. although both amino acids are uncharged at physiological ph, serine has an aliphatic hydroxyl side chain, whereas phenylalanine has an aromatic side chain. the significance of this amino acid difference with regard to virulence is not known. the gray and jmk strains of ibv are the same serotype, indicating that they are very similar antigenically. however, the pathogenicity of these viruses is different because the gray strain can produce a nephritis. it follows that the amino acids located between residues 99 and 127 may play a role in the different observed pathogeneses for these viruses. this observation is supported by cavanagh et al. (13) , who observed an amino acid difference within the hvr2 region of two vaccine viruses, which may account for the differences in virulence observed for those viruses. the molecular basis for tissue trophism may become more apparent as the sequence becomes available for other nephropathogenic strains, such as holte (22) , australian t (26), and one of the holland strains (22) . diseases of poultry we thank the veterinary medical experiment station, university of georgia, for their support in funding these experiments. key: cord-283168-kl1hoa1x authors: farkas, tibor; fey, brittney; hargitt, edwin; parcells, mark; ladman, brian; murgia, maria; saif, yehia title: molecular detection of novel picornaviruses in chickens and turkeys date: 2011-12-13 journal: virus genes doi: 10.1007/s11262-011-0695-4 sha: doc_id: 283168 cord_uid: kl1hoa1x fecal specimens, including swabs and litter extracts, collected from chickens, domestic ducks, turkeys, and canadian geese were tested using degenerate primers targeting regions encoding for conserved amino acid motifs (ygdd and dy(t/s)(r/k/g)wdst) in calicivirus rna-dependent rna polymerases. similar motifs are also present in other rna viruses. two fecal specimens and 18 litter extracts collected from chickens and turkeys yielded rt-pcr products. blast search and phylogenetic analysis revealed that all amplicons represented picornaviruses that clustered into two major groups. four chicken and one turkey samples yielded 250 bp amplicons with 84–91% nucleotide identity to the recently described turkey hepatitis viruses, while 280 and 283 bp amplicons obtained from 11 chicken and 4 turkey samples represented novel picornaviruses with the closest nucleotide identity to kobuviruses (54–61%) and turdiviruses (47–54%). analysis of 2.2–3.2 kb extended genome sequences including the partial p2 (2c) and complete p3 (3a, 3b (vpg), 3c(pro), and 3d(pol)) regions of selected strains indicated that viruses yielding the 280/283 bp amplicons represent a putative new genus of picornaviridae. the 3′-non-translated region (ntr) of the turkey hepatitis-like viruses described in this study was significantly longer (641–654 nt) than that of any of the other piconaviruses and included a putative short open reading frame (orf). in summary, we report the molecular detection of novel picornaviruses that appear to be endemic in both chickens and turkeys. picornaviruses are small, non-enveloped, single-stranded positive strand rna viruses with a *7-9 kb genome. all known picornavirus genomes encode a single long open reading frame (orf) from which a long polyprotein is translated and cleaved by virus encoded proteases to yield the individual structural and non-structural viral proteins. the long orf has been divided into three regions: p1, p2, and p3. the p1 region encodes the viral capsid proteins while the p2 and p3 regions encode proteins involved in protein processing or genome replication: 2a pro , 2b, 2c and 3a, 3b (vpg), 3c pro , 3d pol , respectively. picornaviruses have been described in humans and different animal species and can be the causative agents of a wide variety of diseases. the picornaviridae family currently consists of 12 genera: enterovirus, cardiovirus, aphtovirus, hepatovirus, parechovirus, erbovirus, kobuvirus, teschovirus, sapelovirus, senecavirus, tremovirus, and avihepatovirus [1] . several other picornaviruses including the recently described turdiviruses and turkey hepatitis viruses still await species or genus assignment. turdiviruses were discovered in tracheal and cloacal swabs obtained from dead wild birds of the genus turdus in the family turdidae [2] . two distinct groups representing two proposed new genera (ortho-and paraturdivirus) have been described. these viruses could not be propagated in cell culture or in chicken embryos and their prevalence, host range, and disease burden are unknown. picornaviruses that were tentatively named turkey hepatitis viruses (thv) were recently discovered in liver samples collected from diseased turkey poults with turkey hepatitis. thv was also detected in bile, intestine, serum, and cloacal swabs of diseased animals and is the candidate causative agent of turkey hepatitis [3] . based on the morphological descriptions of small round viruses in healthy and diseased avian species [4] [5] [6] [7] [8] we initiated a study for the molecular detection of caliciviruses in avian fecal specimens. this study utilized a broadly reactive primer set targeting conserved amino acid motifs encoding regions present in calicivirus rna-dependent rna polymerases (rdrp) and are partially also present in other viral rdrps. as part of the study here we report the serendipitous detection of novel picornaviruses in chicken and turkey samples that included diagnostic cases with runting-stunting syndrome (rss). fecal swabs collected from 42 broiler chickens, 25 domestic ducks, 11 turkeys, 149 canadian geese in delaware, and 73 litter extracts collected from 4 chicken and 4 turkey farms in north carolina were tested (table 1) . twenty-eight of the 42 chicken swabs represented diagnostic cases with rss. all of the other samples were collected from healthy animals. swabs were soaked in 1 ml sterile pbs. litter samples were saturated with sterile pbs and washes were collected. all samples were aliquoted and stored at -80°c. equal volumes of 2-4 samples from the same sample group were pooled together and viral rna was extracted by the qiaamp viral rna mini kit on a qiavac 24 plus vacuum manifold (qiagen inc., valencia, ca), according to the manufacturer's instructions. twenty-two sample pools along with negative (deionized water) and positive (recovirus) controls were extracted at a time. the titer of tissue culture-adapted ft285 recovirus strain was adjusted to 10 4 pfu/ml and 150 ll aliquots were made and stored at -80°c. extracted rna was eluted in 30 ll buffer and stored at -80°c. rna from individual samples of rt-pcr positive pools were extracted as described above. rt-pcr screening and dna sequencing viral rna was amplified from 3 ll of extracted rna template in 25 ll reactions using the accessquick rt-pcr system (promega, madison, wi) according to manufacturer's instructions with p289/p290 as it was described in our previous studies [9] [10] [11] . reactions were analyzed on 2% agarose gels in the presence of ethidium bromide. rt-pcr products were excised from agarose gels, recovered by the wizard sv gel and pcr-clean up system and cloned into pgem-t vector (promega, madison, wi) according to the manufacturer's protocols. positive clones were identified by pcr. plasmid dna was isolated from 2 ml cultures by the wizard plus sv miniprep dnapurification system (promega, madison, wi) according to the manufacturer's instructions and sequenced using m13 forward and reverse primers by the chain termination method on an abi prism ò 3730 dna analyzer (applied biosystems inc., foster city, ca). each sample was sequenced in both directions from two-independent clones. the genome of selected picornavirus strains representing each group was amplified to the 3 0 end (*1,000 nt) with strain-specific forward primers (ctccactacctcaa cactatcc for group 1, tgtgatgattggyggyatg for group 2, and atgagatggaaggaggratg for group 3 viruses, respectively) and an oligo-dt primer. further extension of the p3 region, encoding 3a, 3b (vpg), 3c pro , and 3d pol proteins was achieved by primer walking using strain-specific reverse primers and degenerate primers targeting nucleotide sequences encoding for conserved amino acid motifs ddxgq (ttcatcgaygacatcgg icar) in the 2c and gxcg (ccttcsagggyitst gygg) in the 3c pro regions. sequence and phylogenetic analysis blast analyses of sequences without the primers were run against ncbi databases. multiple sequence alignments of nucleotide and amino acid sequences were created using the omiga v2.0 software (oxford molecular ltd, oxford, uk). dendrograms were constructed by the unweighted pair group method with arithmetic mean (upgma) and the neighbor-joining clustering methods of the molecular [12] . the confidence values of the internal nodes were obtained by performing 1,025 bootstrap analyses. picornavirus sequences representing all established or proposed genera were included in the analyses (accession numbers are listed in fig. 1 ). predictions of 3 0 -ntr secondary structure secondary structure predictions for 3 0 -ntr regions were generated using the webserver for aligning non-coding rnas (war, http://genome.ku.dk/resources/war/) [13] . war was used to generate consensus alignments and secondary structures for the terminal 240 nt of each of the 3 0 -ntrs (chk148, chk168, trk22, trk24, aichi virus, and turdivirus 1), as well as the extended 5 0 -portions of chk148 and trk22 viruses. war submissions are limited to 250 nt, and the greatest homology among the 3 0 -ntrs was in the 3 0 -terminal *250 nt. the size of 240 nt for the 3 0 -consensus was chosen due to this being the size of the aichi virus 3 0 -ntr, the smallest of the viruses examined. for the extended 3 0 -ntrs of chk148 and trk22, the 5 0 -most 250 nt were used for secondary structure predication. fasta files of rna alignments were uploaded to the war server using dnastar megalign software (dna-star, lasergene 8.1, madison, wi) and consensus alignments and secondary structure predictions were generated using 14 simultaneous rna structural prediction programs (listed at http://genome.ku.dk/resources/war/). predictions were ranked according to free energies, highest covariance scores, average bp probability, and the fraction of canonical base-pairing. structures predicted by more than one program are reported. owing to differences in relatedness, consensus structures were developed using comparisons of alignments of chk148 and trk22, and alignments of aichi virus, turdivirus 1, chk168, and trk24 strain 3 0 -ntrs. tissue culture rt-pcr positive samples were centrifuged at 10,0009g for 15 min and sterile filtered through 0.2 lm syringe filters (millipore, billerica, ma). filtered samples were confirmed for the presence of picornaviruses by rt-pcr and sequencing and inoculated onto primary chicken embryo liver/fibroblast, lmh (chicken liver; atcc crl-2117), vero (african green monkey; kidney; atcc crl-1586), ma104 (african green monkey kidney; atcc crl-2378), and llc-mk2 (rhesus monkey kidney; atcc ccl-7) cells at 60-70% confluent in 24-well tissue culture plates. cultures were monitored daily for cpe, and harvested (medium and cells) at day 5 post-inoculation. after two cycles of freezing and thawing cell debris was removed by centrifugation and the supernatants were passed to fresh cultures. sub-culturing was performed five times regardless of cpe. passages were evaluated for the presence of picornavirus rna by rt-pcr. pre-screening of pooled samples containing diagnostic cases of chicken specimens and chicken or turkey litter extracts yielded rt-pcr amplicons of the approximate expected size (*300 bp). none of the fecal samples collected from healthy broiler chickens, turkeys, domestic ducks, or canadian geese yielded amplicons with similar [3] . none of the amplicons revealed calicivirus sequences. according to phylogenetic analysis of the deduced short 3d pol aa sequences, the 20 sequences obtained in our study fell into two distinct clusters, with distances suggesting the existence of two new genera (genus 1 and genus 2 in this study) within picornaviridae. genus 1 included viruses yielding 280 and 283 bp amplicons and genus 2 was comprised of the thv-like viruses, yielding 250 bp amplicons (fig. 1) . analyses of the p3 regions and 3 0 -ntr genome amplification of selected strains from each group was extended from the p3 region to the poly-a tail. the p3 region of picornaviruses encode proteins 3a, 3b (vpg), 3c pro (protease), and 3d pol (rna-dependent rna polymerase). for strains chk165, chk175, trk90, and trk91 a segment (*1 kb) stretching from the p290 primer binding site of 3d pol to the poly-a tail was amplified. for strains chk148, chk168, trk22, and trk24 a segment (*1.8-2.2 kb) stretching from the gxcg motif of 3c pro to the poly-a tail was amplified. finally, for strain chk1 a segment (*3.2 kb) stretching from the ddxgq motif of the 2c protein to the poly-a tail was amplified. phylogenetic and distance analyses of the complete 3d pol amino acid sequences placed genus 1 viruses closer to turdivirus 1 (''orthoturdivirus'') than our original analyses based on partial rdrp sequences, indicating that genus 1 viruses may represent a highly divergent species of ''orthoturdiviruses'' rather than a new genus (fig. 2a) . however, analyses of the partial 2c and the 3a-3c region supported their classification as a new genus ( fig. 2b ; table 3 ). the 3 0 -ntr sequences of genus 2 (thv-like) viruses obtained in this study were significantly longer than any of the other picornaviruses (table 3 ). these long 3 0 -ntrs contained a short open reading frame (orf) encoding a putative protein (98-114 aa) (fig. 3) . blast searches of these orfs did not reveal any similarity to known proteins in public databases. secondary structure predication analysis was performed on these 3 0 -ntr sequences using a structural and alignment-based collection of rna structure prediction programs (webserver for aligning non-coding rnas, war) (figs. 5, 6). the 3 0 -most 240 nt of chk148 and trk22 were predicted to form a nearly identical series of stemloops predicted by rnaforester ( fig. 5b ; dg = -52.2), murlet (dg = -50.8), and mafft-rnaalifold (dg = -51.0). the 3 0 -ntr of aichi virus, and 240 nt of the 3 0 ntrs of chk148, trk22, and turdivirus 1 were predicted to form a common set of stem-loops predicted by mafft-rnaalifold (dg = -31.8) (fig. 5c) . the additional *300 nt upstream of the 3 0 -ntr stemloop structures in chk148, 178, and trk22, were similarly examined for structurally homologous stem-loops (fig. 6) . this region was found to form a series of stemloops predicted by rnaforester (dg = -43.2), mafft-rnaalifold (dg = -24.4), and lara (dg = -23.86) programs. the function of these stem-loops upstream of those common to the picornavirus 3 0 -ntr stem-loops associated with replication, is currently unknown. after inoculation with rt-pcr positive swabs or litter extracts, no cytopathic effect (cpe) was observed in the non-human primate cell line cultures tested (llc-mk2, ma104, and vero) up to five blind passages. in some of the primary chicken embryo liver/fibroblast and lmh cultures, based on the previous description of calicivirus-like particles in avian species [5] [6] [7] , including chickens with rss [4, 8] the initial goal of our study was the molecular detection of caliciviruses in avian fecal specimens. fecal swabs collected from broiler chickens, domestic ducks, turkeys, and canadian geese in delaware, and litter extracts collected from chicken and turkey farms in north carolina were tested using p289/p290 (table 1) . with the exception of the 28 swabs collected from rss positive chickens in delaware, all of the fecal specimens represented healthy animals. data on the health status of north carolina flocks (litter extracts) were not available. the primers (p289/p290) used in this study are targeting nucleotide sequences encoding for conserved amino acid motifs (ygdd and dy(t/s)(r/k/g) wdst) in the calicivirus rdrps. however, rdrps of rna viruses because of their common evolutionary origin share several conserved motifs. indeed, using p289/p290 for the detection of caliciviruses in several previous studies resulted in the unintentional detection of rna viruses including rotavirus, porcine kobuvirus and astrovirus [14] [15] [16] . similarly, in this study novel picornaviruses were serendipitously amplified with amplicons that were indistinguishable from the calicivirus positive control by size (fig. 4) . analysis of the complete 3d pol sequences of chk1, chk148, chk168, trk22, tr24, and the thvs revealed *16-18 nt match in the 23 nt p290 binding site, with dyscfdst and dys-cfdss amino acid motifs for genus 1 and genus 2 viruses, respectively. recently, two reports describing the molecular detection of caliciviruses in avian species were published. day et al. [17] reported a partial calicivirus sequence (936 nt) identified in a metagenomic analysis of turkey gut rna virus community and wolf et al. [18] reported the full genome sequence (7908 nt) of a chicken calicivirus detected in two clinically normal and one rss chicken. both of these caliciviruses are genetically related to but distinct from sapovirus and represent two putative new genera of caliciviridae. unfortunately, there is no published data on the prevalence of these avian caliciviruses and their role in disease still needs to be established. surprisingly, despite the relatively large number and diverse samples tested, caliciviruses were not detected in any of the samples including the 28 chicken samples collected from chickens with rss. the primers used in our study could not be evaluated directly for their ability to detect the avian caliciviruses but sequence analysis of the chicken calicivirus [18] indicated a good match for primer binding at the sites encoding for the dysgwdst and ygdd amino acid motifs. picornaviruses were detected both in chicken and turkey samples including two fecal swabs collected from chickens with rss, 13 litter samples collected from egg layers and 5 litter samples collected from turkey farms ( table 2) . phylogenetic analyses of partial 3d pol sequences divided the 20 picornaviruses into two distinct clusters (fig. 1) . both clusters contained viruses detected in both chicken and turkey samples suggesting that these picornaviruses can infect both avian species. fifteen samples including the two positive swabs from chickens with rss, 9 litter samples collected from egg layers, and 4 litter samples collected from turkeys contained novel picornaviruses (genus 1) with no closely related sequences in public databases. recently, in a metagenomic analysis of the turkey gut rna virus community day et al. [17] reported the identification of rna sequences with homology to seven of the nine recognized picornavirus genera with the largest number of sequences bearing homology to kobuvirus. unfortunately, these sequences are not available from public databases for comparison with sequences obtained in our study. phylogenetic analysis of the entire p3 region including 3a, 3b (vpg), 3c pro , and 3d pol of chk1 placed genus 1 viruses closer to ''orthoturdivirus'' than our original analysis of the partial 3d pol sequences (fig. 1) . phylogeny of the complete 3d pol sequences separately indicated that genus 1 viruses might represent a highly divergent species of ''orthoturdivirus'' (fig. 2a) , however, this was not supported by analyses of the partial 2c and the 3a-3c regions which placed genus 1 viruses further apart from ''orthoturdivirus'' supporting their classification as a new genus ( fig. 2b ; in accordance with the results of the phylogenetic distance analysis, alignments of the separate p3 proteins revealed that chk1 3d pol region alone shared a higher (62%), while the 3a-3c pro region of p3 and the available partial 2c region of p2 shared a lower (28 and 41%, respectively) amino acid identity with turdivirus 1 (table 3) . ortho-and paraturdiviruses were discovered recently in tracheal and cloacal swabs obtained from dead wild birds of the genus turdus in the family turdidae [2] . turdiviruses could not be propagated in cell culture or in chicken embryos and their prevalence, host range, and disease burden are unknown. based on our analysis, genus 1 viruses described in this study represent a putative new picornavirus genus with the closest evolutionary roots to orthoturdivirus. since genus 1 viruses were described in chicken and turkey samples we propose the tentative name ''gallivirus'' for the genus. for the final classification and nomenclature of genus 1 viruses, analysis of complete genome sequences, their host range, pathogenicity, and antigenic relationships needs to be determined. the remaining five picornavirus sequences (genus 2) clustered separately from genus 1 viruses and together with the recently described thvs [3] (fig. 1) . pairwise amino acid alignments of the complete 3d pol revealed a high (97-99%) homology between trk22, chk148, and the thvs. the published sequences of the turkey hepatitis viruses did not include the complete 5 0 and 3 0 -ntrs. in this study complete 3 0 -ntr sequences of three genus 2 viruses (chk148, chk178, and trk22) were obtained revealing a significantly longer 3 0 -ntr region (641-654 nt) than that of any other picornaviruses ( table 3) . aligments of the 3 0 -ntrs with the partial (137 and 172 nt) 3 0 -ntr regions that were published for thvs clearly separated the chicken and turkey viruses into two groups. chk148 and chk178 had a 52-55% nucleotide identity with thv0091 and thv2993d, while trk22 exhibited a 82-89% identity to thv0091 and thv2993d, respectively. moreover, an eight nucleotide deletion was clearly conserved among thv0091 and thv2993d and trk22. the full length 3 0 -ntrs of genus 2 viruses obtained in our study contained a putative short orf: 342 nt (114 aa) for chk148, and chk178 and 294 nt (98 aa) for trk22, respectively (fig. 3) . the putative short orf sequences of chk148 and chk178 had 93% nucleotide and 99% amino acid identity to each other but only 53% nucleotide and 34% amino acid identity and 50% amino acid similarity to the trk22 short orf. none of these proteins showed homology to any viral proteins available in public databases. whether these orfs encode for a functional protein or the relevance of the unusually long 3 0 -ntr of these viruses remains to be established in future studies. structures at the extreme 3 0 -ends of picornavirus genomes define the orir (3 0 -ntr origin of replication) typically include stem-loops (x, y, and z) important for circularization of the genome during minus strand (antigenome) replication. secondary and tertiary structures in the 3 0 -ntr vary in overall complexity, but have been described as having trna-like folds with the ''kissing'' of stem-loops in a higher order folded pseudoknot [19, 20] . in our analysis of predicted consensus rna secondary structures of the viruses reported here, complex secondary structures identified for both the extreme 3 0 -ntrs (fig. 5 ) and the additional sequences found in chk148 and trk22 3 0 -ntrs (fig. 6) . as sequences at the extreme 3 0 -ends of picornaviruses, arteriviruses, and coronaviruses are important for antigenome and subsequent genome synthesis [21] , and given the limited sequence identity among the strains examined, we used a structural alignment-based set of programs to predict common structural elements of these 3 0 -ntrs. as several common structures were predicted using different algorithms, it seems likely that these structures provide the basis for future structure/function studies with respect to their role in genome replication. while the thvs were described in samples collected from turkey poults with symptoms of turkey hepatitis [3] , in our study similar viruses were detected in 4 litter samples collected from egg layer chickens and in 1 litter sample collected from turkey poults. since thvs are the proposed causative agents of turkey hepatitis, evaluation of the pathogenicity of these viruses in chickens is important. preliminary studies indicate that both chk148 and trk22 can be propagated in embryonated chicken eggs (unpublished data). efforts for the tissue culture adaptation of the picornaviruses described in this study were unsuccessful. virus isolation from fecal material can be difficult due to low virus load, toxicity of the material, and the abundance of diverse viral agents that often overgrow the target virus. many enteric viruses require polarized epithelial cells for replication and may have species-specific requirements. more cell lines and primary cell cultures should be evaluated in future studies. the secondary structure alignment of chk 148 and trk 22 3 0 -ntrs (terminal 240 nt), the predicted stem-loop structure (below right), and derived thermodynamic and statistical values for the proposed structure (dg, avg covariation, avg bp probability and canonical bp). the structure shown was predicted by rnaforester using the webserver for aligning non-structural rnas (war, http://genome.ku.dk/resources/war/). c the alignment based on structural prediction for the 3 0 -ntrs of chk168, trk 24, aichi virus, and turdivirus, the predicted stem-loop structure (bottom right) and values generated in the derivation of this structure (bottom left). structural alignments were generated using the iupac nucleotide ambiguity system. boxed sequences in alignments b and c correlate with the boxed loops in the secondary structure predications and are provided for reference and orientation in summary, we described the molecular detection of novel picornaviruses in chicken and turkey samples, including viruses that were recently suggested to be the causative agents of turkey hepatitis. these viruses represent two possible new genera of picornaviridae that appear to be endemic in both chickens and turkeys. further characterization of these viruses including their host range and prevalence and studies to link infection to clinical disease such as hepatitis or rss are necessary. virus taxonomy: classification and nomenclature of viruses: ninth report of the international committee on taxonomy of viruses ). c the predicted stem-loop structure (below right), and derived thermodynamic and statistical values for the proposed structure (dg, avg covariation, avg bp probability, and canonical bp). boxed sequences in the alignment in b correlate with the boxed loop in the secondary structure acknowledgments we thank dr. carolyne price for providing the lmh cell line, bryan donnelly for providing the primary chicken embryo fibroblast cells and nicole farkas for helping with sample transport. we also thank dr. margaret k. hostetter for her support. the infectious disease scholar fund of cchmc to t. f. was used to fund this study. key: cord-331919-6kistim2 authors: song, daesub; park, bongkyun title: porcine epidemic diarrhoea virus: a comprehensive review of molecular epidemiology, diagnosis, and vaccines date: 2012-01-22 journal: virus genes doi: 10.1007/s11262-012-0713-1 sha: doc_id: 331919 cord_uid: 6kistim2 the porcine epidemic diarrhoea virus (pedv), a member of the coronaviridae family, causes acute diarrhoea and dehydration in pigs. although it was first identified in europe, it has become increasingly problematic in many asian countries, including korea, china, japan, the philippines, and thailand. the economic impacts of the pedv are substantial, given that it results in significant morbidity and mortality in neonatal piglets and is associated with increased costs related to vaccination and disinfection. recently, progress has been made in understanding the molecular epidemiology of pedv, thereby leading to the development of new vaccines. in the current review, we first describe the molecular and genetic characteristics of the pedv. then we discuss its molecular epidemiology and diagnosis, what vaccines are available, and how pedv can be treated. porcine epidemic diarrhoea (ped), which was first observed among english feeder and fattening pigs in 1971 [1] , is a devastating enteric disease that manifests as sporadic outbreaks during the winter, leading to damage on breeding farms. characterised by watery diarrhoea, ped resembles transmissible gastroenteritis (tge), but has less of an effect on suckling pigs (\4-to 5-week old); this is what allowed ped to first be distinguished from the tge virus and other recognized enteropathogenic agents. as it spread through europe, the disease was named 'epidemic viral diarrhoea (evd) .' unlikely what the disease used to outbreak in fattening pigs, different types of evd caused acute diarrhoea in pigs of all ages in 1976. this type of evd was classified as evd type 2 [1] , different from the previously recognized type 1 [2] . evd type2 was turned out to be caused by a coronavirus-like agent in 1978 [3, 4] using experimentally designed cv777 which caused enteropathogenic infection in both piglets [3] and fattening swine. this was when the disease started to be called as 'porcine epidemic diarrhoea (ped)' [4] . both transmissible gastroenteritis virus (tgev) and porcine epidemic diarrhoea virus (pedv) are classified into group 1 of the genus coronavirus. pedv ranges in diameter from 95 to 1990 nm (mean diameter: 130 nm), including its projection. as in many particles with a tendency to a round shape, the pedv contains a centrally located electronopaque body; it also possesses widely spaced club-shaped projections measuring 18-23 nm in length. the internal structure of the virus remains unknown. the pedv is sensitive to ether and chloroform and has a density in sucrose of 1.18 g/ml. the virus possesses a glycosylated peplomer (spike, s) protein, poll (p1), envelope (e), glycosylated membrane (m) protein, and an unglycosylated rna-binding nucleocapsid (n) protein [5] . cell cultureadapted pedv loses its infectivity when heated to c60°c for 30 min, but is moderately stable at 50°c; further, the virus is stable between ph 5.0 and 9.0 at 4°c and between ph 6.5 and 7.5 at 37°c [6] . pedv shows no haemagglutinating activity [6] . the pedv propagates by orally inoculating piglets, after which, during the early stages of diarrhoea, it collects in the tissues and contents of the small intestine [3] . vero (african green monkey kidney) cells support the serial propagation of pedv and grow successfully in laboratory conditions; however, growth of the virus depends on the presence of trypsin in the cell culture medium. cytopathic effects consist of vacuolation and formation of syncytia. during the 1980s and 1990s, ped was prevalent throughout europe, in countries such as belgium, england, germany, france, the netherlands, and switzerland (table 1) . ped is currently a source of concern in asia, where outbreaks are often more acute and severe than those observed in europe. in this respect, and in their high mortality rates, these resemble tgev outbreaks. for example, japanese outbreaks between september 1993 and june 1994 resulted in 14,000 deaths, with mortality ranging from 30 to 100% in suckling pigs. during these epidemics, adult pigs showed only temporary decreases in appetite and milk production [7] . another ped epidemic occurred in the winter of 1996, during which 39,509 of 56,256 infant farrow-to-finish piglets died after experiencing diarrhoea. between january 1992 and december 1993, 56.3% of viral enteric cases in infant pigs surveyed in korea were attributable to pedv, rather than tgev. the vast majority of outbreaks (90%) involved piglets \10-day-old [8] . the clinical lesions of pedv in the small intestine of piglets were similar to those of tgev. lesions are confined to the small intestine, which is distended with yellow fluid (fig. 1 ). ped outbreaks also occurred in thailand from 2007 to 2008. most of the affected farms reported that the disease first occurred in farrowing barns; 100% of newborn piglets were subsequently lost. between august 1997 and july 1999, 50.4% of 1,258 enteric cases across 5 korean provinces were diagnosed as ped [9] ; further, a korean abattoir serosurvey found pedv seroprevalences of 17.6-79% (mean of 45%) in samples from 469 pigs from seven provinces. cumulatively, these results suggest that the virus had become endemic in some areas [10] ( table 1) . however, recent outbreaks seemed to be concentrated in certain countries where pork industry is prevalent, such as philippines, south korea and china. pedv is an enveloped virus possessing an approximately 28 kb, positive-sense, single-stranded rna genome with a 5 0 cap and a 3 0 polyadenylated tail [11, 12] . the genome comprises a 5 0 untranslated region (utr), a 3 0 utr, and at least seven open reading frames (orfs) that encode 4 structural proteins [spike (s), envelope (e), membrane (m), and nucleocapsid (n)] and three non-structural proteins (replicases 1a and 1b, and orf3); these are arranged on the genome in the order 5 the polymerase gene consists of 2 large orfs, 1a and 1b, that cover the 5 0 two-third of the genome and encode the non-structural replicase polyproteins (replicases 1a and 1b). genes for the major structural proteins s (150-220 kda), week of age died from severe watery diarrhoea after showing signs of dehydration. after the acute outbreak, piglets were anorectic, depressed, vomiting, and producing water faeces that did not contain any signs of blood. necropsies of deceased piglets from the kimpo outbreak uncovered gross lesions in the small intestines, which were typically fluidic, distended, and yellow, containing a mass of curdled, undigested milk. atrophy of the villi caused the walls of the small intestines to become thin and almost transparent virus genes (2012) 44:167-175 169 e (7 kda), m (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) , and n (58 kda) are located downstream of the polymerase gene [15, 18, 20] . the orf3 gene, which is an accessory gene, is located between the structural genes. it encodes an accessory protein, the number and sequence of which varies among different coronaviruses [20] . the pedv s protein is a type i glycoprotein composed of 1,383 amino acids (aa). it contains a signal peptide (1-18 aa), neutralising epitopes (499-638, 748-755, 764-771, and 1,368-1,374 aa), a transmembrane domain (1,334-1,356 aa), and a short cytoplasmic domain. the s protein can also be divided into s1 (1-789 aa) and s2 (790-1,383 aa) domains based on its homology with s proteins of other coronaviruses [21] [22] [23] [24] [25] [26] . like other coronavirus s proteins, the pedv s protein is a glycoprotein peplomer (surface antigen) on the viral surface, where it plays a pivotal role in regulating interactions with specific host cell receptor glycoproteins to mediate viral entry, and stimulating induction of neutralising antibodies in the natural host [15, 21-23, 26, 27] . moreover, it is associated with growth adaptation in vitro, and attenuation of virulence in vivo [28, 29] . thus, the s glycoprotein would be a primary target for the development of effective vaccines against pedv. additional studies of this structure are essential for understanding the genetic relationships between, and diversity of, pedv isolates, the epidemiological status of pedv in the field, and the association between genetic mutations and viral function [29] [30] [31] [32] [33] . it was reported that aminopeptidase n is the receptor of tgev, human coronavirus 229e (hcov-229e) and feline coronavirus (fecov) which all belong to group i coronavirus including pedv [34] . the pedv m protein, the most abundant envelope component, is a triple-spanning structural membrane glycoprotein with a short amino-terminal domain on the outside of the virus and a long carboxy-terminal domain on the inside [35] . the m protein not only plays an important role in the viral assembly process [36, 37] but also induces antibodies that neutralise the virus in the presence of its complement [37, 38] . the m protein may play a role in a-interferon (a-ifn) induction [39] . coexpression of m and e proteins allowed the formation of pseudoparticles, which exhibited interferogenic activity similar to that of complete virions [40] . additional work on the m glycoprotein should increase our understanding of the genetic relationships between, and the diversity of pedv isolates and the epidemic situation of pedv in the field [30, [41] [42] [43] [44] [45] . the n protein, which binds to virion rna and provides a structural basis for the helical nucleocapsid, is a basic phosphoprotein associated with the genome [5, 16, 18, 46] . as such, it can be used as the target for the accurate and early diagnosis of pedv infection. it has been suggested that n protein epitopes may be important for induction of cell-mediated immunity (cmi) [38] . whereas the genes encoding the structural proteins have been thoroughly investigated for most coronaviruses, little is known about the functions of the accessory proteins, which are not generally required for virus replication in cultured cells [46] [47] [48] [49] . on the contrary, their expression might lead to decreases of viral fitness in vitro, and mutants with inactivated accessory genes are easily selected during serial passage through cell cultures [50] [51] [52] [53] . in general, accessory genes are maintained in field strains [50, 54] , and their loss mainly results in attenuation in the natural host [55] [56] [57] . in the case of pedv, the only accessory gene is orf3, which is thought to influence virulence; cell culture adaptation has been used to alter the orf3 gene in order to reduce virulence [52] , as has been done for tgev [53] . differentiation of orf3 genes between the highly cell-adapted viruses and field viruses could be a marker of adaptation to cell culture and attenuation of the virus [52, 58, 59] . thus, measures of variation in orf3 gene differentiation could be a valuable tool in molecular epidemiology studies of the pedv [42, 45, 52, 59] . genetic and phylogenetic analyses based on the s, m, and orf3 genes have been used to determine the relatedness of pedv isolates, both within korea and among various countries in which pedv has surfaced. research on part of the s gene, and on the entire m gene, have suggested that pedvs can be separated into three groups (g1, g2, g3), which have three subgroups (g1-1, g1-2, g1-3) [32] . according to analysis of the partial s genes, the g1 pedvs had 95.1-100% nucleotide sequence similarities with each other, and they had 93.5-96.7 and 88.7-91.5% sequence identities with the g2 and g3 pedvs, respectively. the g2 pedvs had 96.7-99.8% similarities with each other, and they had 91.8-93.0% similarities with the g3 pedvs [32] . these results reflect the existence of genetic diversity among the korean pedv isolates (fig. 3) . the majority of the korean pedv isolates are closely related to chinese strains [32] . the chinese pedv clade also contains all strains isolated from several outbreaks of pedv that have occurred in thailand since late 2007. these classifications have been based on the phylogenetic relationship of the s genes, and support the results of park et al. [32] . recently, after analyzing the full s gene-based phylogenetic tree [31] reported that all pedvs can be separated into 2 clusters, and that korean field isolates are more closely related to each other. in 2006, an analysis of the m gene of 6 pedvs isolated from the faeces of chinese piglets indicated that the isolates compose a separate cluster with chinese strain js-2004-02 [60] . these results demonstrated that there may be a new prevailing pedv genotype in china [60] . phylogenetic relationships of complete m gene nucleotide sequences indicate that recent thai pedv isolates are closely related to isolates from china [30] . likewise, most korean pedv isolates have been found to be closely related to chinese strains [45] , and belong to the third of 3 pedv groups containing all pedv isolates [45] . 3 relationships among pedvs isolated from various countries based on the partial s gene including epitope region. the phylogenetic tree was constructed using the neighborjoining method in mega version 5.05 with pairwise distances [99] . bootstrap values (based on 1,000 replicates) for each node are given if [60%. the scale bar indicates nucleotide substitutions per site. asterisk represents pedv isolate whose sequence available in genbank database was shorter as compared to that of other reference strains. pedvs isolated from various countries were marked with various colors: europe (black), korea (blue), china (red), japan (olive green), thailand (green) and viet nam (purple) (color investigations of the orf3 gene have revealed the reemergence of pedv in immunised swine herds since early 2006 [42] . orf3 genes have been used to divide chinese field strains and pedv reference strains into 3 groups; further, chinese field strains appear to be closely related to korean strains, but genetically different from pedv vaccine strains. another report revealed that pedv has caused enteric disease with devastating impact since the first identification of pedv in 1992 in korea, and recent, prevalent korean pedv field isolates are closely related to chinese field strains but differ genetically from european strains and vaccine strains [45] . a diagnosis of ped cannot be made on the basis of clinical signs and histopathological lesions [61] [62] [63] [64] . due to the similarities in causative agents of diarrhoea, differential diagnosis is necessary to identify the pedv in the laboratory [64, 65] . many techniques have been used for the detection of pedv, including immunofluorescence (if) tests, immunohistochemical techniques, direct electron microscopy, and enzyme-linked immunosorbent assays (elisa). however, these techniques are time-consuming and are low in sensitivity and specificity [66] . kim et al. [67] compared three techniques (rt-pcr, immunohistochemistry and in situ hybridization) for the detection of pedv. they concluded that although rt-pcr identified the presence of pedv more frequently than the other methods, when only formalin-fixed tissues are submitted, immunohistochemistry and in situ hybridization would be useful methods for the detection of pedv ag and nucleic acid. the pedv leader sequence was used to develop a reverse transcriptase polymerase chain reaction (rt-pcr) diagnostic technique [68, 69] that has successfully been used to detect both laboratory and field isolates [70, 71] . m gene-derived primers can be used in an rt-pcr system to obtain pedv-specific fragments [69] , and duplex rt-pcr has been used to differentiate between tgev and pedv [66] . the past few years have seen several useful modifications of the basic rt-pcr method. for instance, it is possible to estimate the potential transmission of pedv by comparing viral shedding load with a standard internal control dna curve [72] , as well as to perform multiplex rt-pcr to detect pedv in the presence of various viruses [73] -a technique that is particularly useful for rapid, sensitive, and cost-effective diagnosis of acute swine viral gastroenteritis). the commercial dual priming oligonucleotide (dpo) system (seegene, seoul, korea) was also developed for the rapid differential detection of pedv. it employs a single tube 1-step multiplex rt-pcr with two separate primer segments to block a non-specific priming [74] . another useful reverse transcription-based diagnostic tool is rt loop-mediated isothermal amplification (rt-lamp). this assay, which uses 4-6 primers that recognize 6-8 regions of target dna, is more sensitive than gel-based rt-pcr and elisa, in large part because it produces a greater quantity of dna [75] . immunochromatographic assay kits can be used at farms in order to detect pedv s proteins with 92% sensitivity and 98% specificity. this technique is less accurate than rt-pcr, but allows diagnosis within 10 min. thus, it is particularly effective for quickly determining quarantine or slaughter policies in the field. especially, endemic situation of ped infection brought the several commercialised ped virus detection systems using diagnosis techniques including conventional duplex rt-pcr (intron biotechnology, inc, korea), real time rt-pcr (kogenebiotech, kore), dpo based multiplex rt-pcr (seegene, seoul, korea), and immunochromatography (bionote, korea) in korea. recently, a protein-based elisa was developed to detect pedv. in this technique, a polyclonal antibody is produced by immunising rabbits with purified pedv m gene after its expression in escherichia coli. if analysis with anti-pedv-m antibody is then able to detect pedvinfected cells among other enteric viruses [76] . elisa blocking and indirect if have been used to detect pedv antibodies at 7 and 10-13 days postinoculation, respectively [77] . for all tests, the second (convalescent) serum sample should be collected and examined no sooner than 2-3 weeks after the onset of diarrhoea. pedv antibodies, detected by the elisa-blocking and if-blocking tests, have been found to persist for at least 1 year. due to the special features of the porcine mucosal immune system, the presence of serum antibodies against gastroenteric pathogens is not always correlated with protection; rather, detection of these antibodies only proves that individuals had contact with infectious microorganisms [78] [79] [80] . additionally, ha et al. [81] recently reported that colostrum iga concentration is a better marker of protection from pedv infection than serum neutralising (sn) titre from serum samples; however, sn titres may still be useful in determining herd infection status [81] . until they are 4-to 13-day old, piglets are protected against pedv by specific igg antibodies from the colostrum and milk of immune sows [82] ; the length of immunity depends on the titre of the mother. after antigenic sensitisation in the gut, iga immunocytes migrate to the mammary gland, where they localise and secrete iga antibodies into colostrum and milk. this 'gut-mammary' immunologic axis is an important concept in designing optimal vaccines to provide effective lactogenic immunity [83] . pigs that regularly suckle the immune mother are constantly inoculating their lumens with milk-bound iga antibodies, a process that confers passive immunity. igg accounts for more than 60% of colostrum immunoglobulin content. however, iga is more effective at neutralising orally infectious pathogens than either igg or igm because it is more resistant to proteolytic degradation in the intestinal tract and has a higher virus neutralising ability than igg and igm [84] . therefore, only passive transfer of iga from an immunised mother effectively induces immune responses in suckling piglets [85] . however, these antibodies do not protect against intestinal infection with pedv. several pedv vaccines, which differ in their genomic sequence, mode of delivery, and efficacy, have been developed. a cell culture adaptation of the cv777 strain had a strikingly different genomic sequence [18] , was associated with much lower virulence in new born caesarean-derived piglets, and caused much less severe histopathological changes. however, in europe, the disease caused by pedv was not of sufficient economic importance to start the vaccine development. therefore, the trial of vaccine development was mainly accomplished in asian countries where the pedv outbreaks have been so severe that the mortality of the new born piglets was increased. an alternative vaccine for suckling piglets may be an attenuated form of the virus derived from serial passage (passage level: 93) of the pedv [86] . in japan, a commercial attenuated virus vaccine of cell cultureadapted pedv (p-5v) has been administered to sows since 1997. although these vaccines were considered efficacious, not all sows developed solid lactogenic immunity [87] . oral vaccination with attenuated pedv dr13 (passage level: 100) has recently been proven to be more efficacious than injectable vaccine. further, this vaccine candidate remained safe even after three back passages in piglets [88] . piglet mortality can be reduced by orally inoculating pregnant sows with the dr13 strain. the viral strain was licensed, and used as an oral vaccine in south korea from 2004 (patent no. 0502008). and the oral vaccine was registered and commercialised in philippine at 2011. despite the documented benefits of the dr13 vaccine, it does not significantly alter the duration of virus shedding-an indication of immune protection [79, 89] in challenged piglets. shorter periods of virus shedding, as well as reduced severity and duration of diarrhoea in piglets, result from higher titres of serum antibodies; complete protection from pedv infection prevents shedding after exposure to viral challenge [90] . oral immunisation with highly attenuated pedv confers partial protection against virulent challenge in conventional pigs, a result that is related to inoculation dose. at low doses of the attenuated pedv, 25% of pigs are protected against pedv challenge, but this proportion increased to 50% when pigs were inoculated with a dose 20 times stronger [91] . however, viral shedding may be difficult to measure accurately, as it is varies depending on viral strain and sensitivity of the detection tool [72] . therefore, for the ideal and perfect development of vaccines, several criteria including the factors related the reduction of virus shedding in piglets, and the details of the mucosal immunity of pedv should be focused in the course of development of next generation vaccines. information on pedv mucosal immunity has typically been limited. de arriba et al. used the enzyme linked immunospot (eli-spot) technique to characterise the isotype-specific antibody secreting cells in mucosal and systemic-associated lymphoid tissues in pigs inoculated with pedv. after infection with pedv, levels of antibody secreting cell (asc) in the gut were similar to those observed in response to tgev and rotavirus infection; igg ascs were more prevalent than iga ascs. in pedv-infected pigs, a limited number of igm ascs were detected at post infection day (pid) 4, and memory b cells appeared at pid 21 in mesenteric lymph nodes, spleen, and blood. finally, the authors noted correlations between protection and both serum isotype-specific antibody and asc response in gut-associated lymph tissues and blood on the challenge day [90] [91] [92] . there have also been reports of immune responses by transgenic plants and lactic acid bacteria that express the pedv antigen [85, 93, 94] . the transgenic tobacco plants that express the s protein corresponding to the neutralising epitope of pedv was tested whether feeding the plants induced the immune response in murine model. and the efficacy of orally administered antigen gene transgenic carrot and lettuce were tested after codon optimization and application of viral expression systems [85] . in mice, induced antibodies have neutralising activity against pedv. no neutralising antibodies were detected in either mice or pigs given mucosal immunizations with recombinant lactobacillus casei expressing pedv n (nucleoprotein) on its surface. however, this treatment elicited high levels of mucosal iga and circulation igg immune responses against the pedv n protein. before this vaccine can be commercialised, further studies are needed; for instance, it will be necessary to understand discrepancies between test results of the first lab scale vaccine and large-scale pilot vaccines. research into this and other potential vaccines should be made a priority, as pedv-mediated diarrhoea causes significant economic losses in the swine industry. however, there is also a potential drawback to the use of live-attenuated vaccines. recently, a survey conducted in china indicated close phylogenetic relationships between a chinese pedv field strain (ch/gsjiii/07) and two vaccine strains, suggesting that live vaccines can evolve into more infectious forms in the field [42] . during the european outbreak of pedv, pregnant sows were deliberately exposed to the intestinal contents of dead infected pigs, thus artificially stimulating lactogenic immunity and, hopefully, shortening the duration of outbreaks at farms [12] . however, several complications arose from this treatment. because the intestinal contents did not have homogenous titres of pedv, the induction of immunity-including solid lactogenic immunity-might not be expected. diseases may be spread via contamination with viral agents, such as prrsv and pcv2. immunoprophylactic agents may also be used to treat pedv. for instance, anti-pedv chicken egg yolk immunoglobulin (igy) and colostrums from immunized cows have been found to increase survival rates of virally challenged piglets [95, 96] . mouse monoclonal single chain variable fragment (scfv) antibodies to neutralised pedv, which can be expressed in e. coli, are as potent as parental antibodies and block pedv infection into target cells in vitro [97] . thus, it is possible that recombinant e. coli cells expressing scfv can be used as prophylactic agents against pedv infection. epidermal growth factor (egf), which stimulates the proliferation of intestinal crypt epithelial cells and promotes recovery from atrophic enteritis in pedvinfected piglets [98] , has also been proposed as a potential novel therapy to promote intestinal villous recovery in piglets with pedv infections; it may also be useful in other species with viral atrophic enteritis. drawbacks of this treatment include its high price and questionable safety. pig farming veterinary virology disease of swine clinical histopathological and immunohistochemical findings nidoviruses the coronaviridae coronavirus immunogens revue canadienne de recherche veterinaire revue canadienne de recherche veterinaire disease of swine development of an elisa for the detection fo antibody isotypes against porcine epidemic diarrhoea virus (pedv) in sow's milk key: cord-311712-lkvt9slp authors: barrett, john w.; sun, yunming; nazarian, steven h.; belsito, tara a.; brunetti, craig r.; mcfadden, grant title: optimization of codon usage of poxvirus genes allows for improved transient expression in mammalian cells date: 2006 journal: virus genes doi: 10.1007/s11262-005-0035-7 sha: doc_id: 311712 cord_uid: lkvt9slp transient expression of viral genes from certain poxviruses in uninfected mammalian cells can sometimes be unexpectedly inefficient. the reasons for poor expression levels can be due to a number of features of the gene cassette, such as cryptic splice sites, polymerase ii termination sequences or motifs that lead to mrna instability. here we suggest that in some cases the problem of low protein expression in transfected mammalian cells may be due to inefficient codon usage. we have observed that for many poxvirus genes from the yatapoxvirus genus this deficiency can be overcome by synthesis of the gene with codon sequences optimized for expression in primate cells. this led us to examine colon usage across 2-dozen sequenced members of the poxviridae. we conclude that codon usage is surprisingly divergent across the different poxviridae genera but is much more conserved within a single genus. thus, poxviridae genera can be divided into distinct groups based on their observed codon bias. when viewed in this context, successful transient expression of transfected poxvirus genes in uninfected mammalian cells can be more accurately predicted based on codon bias. as a corollary, for specific poxvirus genes with less favorable codon usage, codon optimization can result in profoundly increased transient expression levels following transfection of uninfected mammalian cell lines. our lab is interested in the dissection of poxvirus gene function, particularly those genes with a predicted immunomodulatory function [1] . towards this goal we routinely attempt to express specific poxvirus open reading frames (orfs) from uninfected mammalian expression vectors for further study in the absence of other competing viral proteins. as well, expression vectors often allow the fusion of the viral protein in-frame with epitope tags that permit detection of the fused, expressed protein. this strategy has generally been successful for the transient expression of leporipoxvirus genes, however we have consistently experienced difficulty expressing many yatapoxvirus genes from mammalian expression vectors. to date we have cloned several dozen viral genes from both tanapox virus (tpv) and yaba monkey tumor virus (ymtv) into the expression vector pcdna3.1myc/his (invitrogen), and have routinely observed little or no protein expression following transfection into human or primate cells. this poor transient expression could be due to the presence of cryptic splice sites, polymerase ii termination sites or mrna instability motifs within the orf resulting in truncated, incomplete or unstable transcripts. however another explanation is that inefficient colon usage could restrict the amount of translated product from mammalian cells [2] [3] [4] . to probe this issue, we have employed the baculovirus expression system (bes) to over-express yatapoxvirus genes of interest, usually with great success [5, 6] . to date, all of the yatapoxvirus genes we have cloned into acnpv are expressed efficiently. although the bes has numerous advantages and allows production of moderate quantities of poxvirus protein, there are still advantages to being able to transiently express a poxvirus gene in an uninfected mammalian cell. as well, we have frequently mutated any predicted cryptic splice sites without altering the encoded amino acid sequence. although such predicted splice sequences could be altered by site directed mutagenesis, we were still not ever able to transiently express yatapoxvirus proteins with efficiencies comparable to genes derived from the lepori-or orthopoxviruses (unpublished). however, for several yatapoxvirus genes of interest we chemically synthesized versions with codon sequences optimized for the human translation machinery. these optimized viral gene sequences were then cloned into pcdna3.1 myc/his and shown to now express at high efficiency in both human (hek293) and non-human primate cells (cos7). these results are consistent with codon optimization of genes from other viruses, including hiv and hpv [2] [3] [4] . this observation led us to examine codon usage bias in members of the poxviridae family. poxvirus members belong to the family poxviridae which is divided into two sub-families: the entomopoxviriane, which are invertebrate poxviruses and can be further subdivided into three ''types'' that are restricted to several insect families, and the chordopoxvirinae, which is subdivided into eight genera that infect vertebrates. complete genomic sequences are now available for representatives of all chordopox genera comprising over 2-dozen representative members (www.poxvirus.org). here we examine the codon usage profiles of these selected poxviruses and try to derive some general principles regarding the ability to predict efficiencies of translation and transient expression of poxvirus genes in mammalian cells. poxvirus genomes were identified from ncbi and the open reading frames saved as fasta files using the ''viewing coding regions'' option of entrez. lists of the nucleotide coding sequences were loaded into the online version of codonw [7] ; http://bioweb.pasteur.fr/ seqanal/interfaces/codonw.html) and the effective codon number and percent gc at the third position was measured. all data was compiled into excel:mac v2004 (microsoft), manipulations were . performed and the numbers were plotted against each other. these plots indicate the codon bias on the y-axis so that the more biased (i.e. non-random) the codon usage is, the closer it will be to a value of 20. the more unbiased (i.e. random) the codon usage, the closer the plot shifts towards 61 (the maximum effective number) where each codon has an equal opportunity to encode an amino acid. transfections and immunoblotting hek293 and cos7 cells were transfected using lipofectamine 2000 (invitrogen inc.) according to manufacturer's specifications. two micrograms of plasmid dna was transfected into each well of a six-well dish. expression was detected with anti-myc (invitrogen) at 1:5,000 dilution, anti-his (qiagen) at 1:10,000 or anti-gp38 [5] at 1:10,000. total rna was extracted from transfected cells at 48 h post transfection using a qiagen rneasy mini kit (qiagen). first strand synthesis was achieved with superscript ii reverse transcriptase (invitrogen) in a 20 ll reaction volume using oligo-(dt) as a primer. the cdna was used as a template for pcr amplification. primers used for pcr amplifying native and mutated 2l were 5¢-cccaagcttcatggataagttactattatttagcac (forward primer, hindiii site italicized) and 5¢-ccgctcgagggtttccgtcttcttcatcctcttc (reverse primer, xhoi site italicized). primers used for pcr amplifying optimized 2l were 5¢-atg aac aaa ctg atc ctg ttc agc (forward primer) and 5¢-gcc aag tct tcc tcg tcc tct tcg (reverse primer). the reaction mix was incubated for one cycle at 95°c for 3 min and then 30 cycles at 95°c for 30 s, 53°c for 1 min and 72°c for 2 min. products were amplified with platinum taq (invitrogen) and resolved on 1% agarose. transient expression of individual viral proteins allows for the study of the specific gene products without the complications of the background contributions from the other viral proteins. towards this goal, we have cloned several dozen yatapoxvirus genes into mammalian expression vectors to analyze their function. unfortunately, we have been unable to detect any expression of yatapoxvirus genes from mammalian expression vectors. for example, the 2l gene from tanapox virus (t2l), an inhibitor of human tumor necrosis factor (hutnf, [5] can be readily detected by immunoblotting from tpv-infected primate cells however it is not detectable when transiently expressed in uninfected primate cells (fig. 1a , compare lanes 1 and 4). in contrast, the same t2l open reading frame is well expressed in the baculovirus expression system (fig. 1b, lane 2) . the scenario where certain viral genes, cloned from mammalian viruses, are not expressed following transfection in mammalian cells but are well expressed from baculovirus promoters in insect cells led us to examine further the problem. processing the sequences through software (http:// www.friutfly.org/cgi-bin/seq_tools/splice.pl) that searches for cryptic splice sites predicted several potential splice sites (table 1 ) when we examined native t2l transcript levels we found that the t2l transcript was indeed truncated in transfected cos7 cells (fig. 1c , lane 3). we attempted to first correct for the cryptic splice sites by site directed mutagenesis which solved the issue of truncated transcripts (fig. 1c, lane 4) , however that did not solve the lack of protein expression (fig. 1a, lane 5) . to overcome this problem we have synthesized several yatapoxvirus genes with codon optimized sequences that favour translation in human cells (topgene, montreal, qc). because poxvirus genes are normally transcribed in the cytoplasm and never encounter the nucleus there has not been any selection pressure exerted by host cell nucleusresident pathways. codon optimization of t2l led to detection of transcripts of the correct size (fig. 1c , lane 5), and t2l protein from transient expression was now readily detectable with our antibodies (fig. 1a, lane 6 ). in the case of t2l, codon optimization, by adjusting the proportion of at in the third codon position to a higher proportion of gc (fig. 2 ) resulted in the switch from undetectable to significant protein expression. such a dramatic change through codon optimization led us to examine the codon usage patterns in poxvirus family members. the twenty amino acids utilized by the universal translational machinery are encoded by 61 codons. the redundancy of codon specificity, and the particular preference of codon selection within a given species, can be informative about its genetic structure and organization. the range of codon usage bias was therefore examined for the poxviridae. complete genomic sequences for two entomopoxvirus species and 19 representative chordopoxvirus genomes are available in genbank (table 2) . to measure the codon bias within a gene, it is first necessary to determine the actual codon usage and compare it to the possible codon options available for each amino acid. this calculation is considered the effective codon number (nc) and this statistic has been developed for comparative studies and evolutionary divergence analyses [9] . the effective codon number estimates the average number of codons actually used above the native t2l sequence, in bold text, are the 17 nucleotides that were altered by site directed mutagenesis to alter the cryptic splice sites and correspond to the sites described in table 1 estimated the nc for all orfs using codonw [7] . the effective number of codons (nc) used by the poxviridae was on average 42.4 and ranged from a very biased nc of 26.99 (amsacta moorei entomopoxvirus) to a more random nc of 52.9 (shope fibroma virus) ( table 2) . two viruses had nc values in the range of 50-61, 11 in the range 40-50, 7 between 30 and 40 and a single species between 20 and 30. although the poxviridae, as a whole, exhibited a range of codon bias, approximately five species displayed extensive bias while the rest exhibited only minor codon usage bias. poxvirus genera can be separated into distinct classes based on codon bias and gc content twenty-one poxvirus genomes were compared by plotting the effective codon number (nc) against the proportion gc in the third position (gc 3 ) (fig. 3) . each plot presents the complete complement of orfs from each genome. there is wide variation in effective codon number (nc) and gc 3 % among the species, however several trends are apparent. generally, all the orfs within a species exhibit a similar gc 3 % and a codon bias that results in a clustering of the orfs. the exception is that the entomopoxviruses, which encode a subset of 6-10 genes which appear to deviate from the majority. while the majority of the entomopoxvirus orfs appear to be extremely at rich in the third position, these ''outliers'' have a higher gc 3 content. the parapoxvirus, and to a lesser extent the molluscipoxvirus genomes, also exhibit a subgroup of outlier genes that deviate from the main group (fig. 3 ). in these genera, there are 16 genes, which have a lower percent of gc (less than 70%) in the third position and these orfs exhibit much less codon bias. where available it also appears that members within a specific genus maintain a conserved codon bias reflected in the effective codon number. we plotted the theoretical effective codon number (line) estimated solely on gc concentration. this suggested that for most poxvirus members the actual codon bias was close to the predicted value based on gc content. based on the plots of the 21 genomes we can group all poxviruses into one of four classes (fig. 3) . class one represents genomes with a highly biased codon usage and with a very low gc percentage in the 3rd position. this class includes the two entomopoxviruses only. we would predict that future epv sequences would also reflect this trend. class two has a more random codon usage with a 50% gc in 3rd position and is exclusive to the leporipoxviruses, represented here by myxoma virus and rabbit fibroma virus and appears to encode orfs that exhibit almost random codon usage. class three includes those genomes, which are highly biased in their codon usage but, in contrast to the entomopoxviruses, these species are the final class is the largest and contains the majority of poxvirus genera. once more genomes are sequenced and analyzed this final group may break down into two distinct classes, however for now this final class includes the capripoxviruses, the single member of the suipoxvirus and deerpox, which is unclassified, and these genes are characterized by mild codon bias (nc avg = 39.42) and between 10% and 20% gc 3 (fig. 3) . the remaining members of this class exhibit a more random codon usage pattern (nc avg = 45.2), similar to class two, however, in contrast, the 3rd position gc% is much lower, on average between 25% and 40% and include all published genomes of the orthopoxviruses, avipoxviruses, and yatapoxviruses. overall, poxvirus species exhibit a range of codon bias usage, however members within a genus have evolved a codon usage bias consistent with other members of their genus. this conservation of the codon usage appears to be gc concentration specific, rather than dependent on host requirements. for example, when we plot the percent gc 3 for all coding regions against the gc content of the first two codon positions (gc 1+2 ), for each genome we find a high correlation between gc in position 1 and 2 and maintenance of gc in the 3rd position. the grouping of the members into the four defined groups is easily visualized (fig. 4) . highly conserved genes do not share codon bias across species unexpectedly, conservation of codon bias for orthologous genes across the multiple poxvirus genera was not observed. for example, when we examine three highly conserved genes found in all published poxviruses, including dna polymerase, p4a (the major core protein) and uracil dna glycosidase, we find that the codon bias is conserved only within the particular genus (fig. 5) . the entomopoxviruses, parapoxviruses and molluscipoxvirus are all highly biased in the codon fig. 4 there is a strong correlation between the gc content of the third synonymous codon position (gc3) and the gc content of the first and second codon positions (gc 1+2 ). each data point represents the average values calculated for each poxvirus species. the groupings described for fig. 2 are circled and each species is identified by an abbreviated name. the abbreviations are taken from table 1 fig. 5 codon bias of conserved genes is not maintained amongst the poxvirus species. the effective codon number as an estimate of codon usage for dna polymerase, p4a and uracil dna glycosidase was calculated for each species and compared within and between species usage of these three genes and this is reflected in the low effective codon number for these three groups. in contrast, the leporipoxviruses are essentially random in the codon selection and the rest of the species fall somewhere in between. therefore it appears that viral genes that are thought to have evolved from a common ancestor have further adapted to the host genetic environment in which the individual poxviruses have invaded. this is true of genes that possess a cellular homolog (dna polymerase) and those that are of strictly viral origin (major core protein). eight poxvirus members from four genera have the ability to infect and produce productive infections in humans, including members of the orthopoxviruses (vaccinia, variola, cowpox, monkeypox), the yatapoxviruses (tanapox, yaba monkey tumor virus), the parapoxviruses (orf, pseudocowpox) and mollusicpoxviruses (molluscum contagiosum) [10, 11] . it might be predicted that the ability to infect humans would require a codon usage profile that matches codon usage in humans, or possibly a conserved codon bias shared amongst species able to infect humans. however this is not borne out by analyses of the actual codon bias of these members. the genomic sequences of seven of the eight members with the ability to replicate in humans are available and there does not seem to be any relationship between the codon usage and ability to infect humans. in fact, variola and mcv infections are restricted to human hosts but variola exhibits less codon usage bias (nc = 46.8) than does molluscum contagiosum (nc = 39) despite a dramatic difference in their gc 3 content. the orfs of variola are generally at rich in the gc 3 position (gc 3 = 29%) versus molluscum contagiosum orfs which are very gc rich (gc 3 = 82%; table 2 ). a plot of the effective codon number against %gc 3 for 135 human cellular genes [9] looks more similar to the profiles for molluscum contagiosum and orf virus (fig. 3 ) than for variola virus. the profiles for the orthopoxviruses and other members of class 4 appear most similar to effective codon plot profiles for the amoeba, dictyostelium discoideum [9] . we have assumed that low expression levels following transfection of certain yatapoxvirus genes were the result of cryptic splice sites that were being processed in the nucleus leading to truncated transcripts. of the yatapox orfs we have tested none has been adequately expressed transiently from pcdna3. 1myc/his in mammalian cells. recently we have had three orfs synthesized to optimize codon usage for human cells. the expression of the modified yatapox genes was dramatic. the codon optimization resulted in excellent expression levels from both transfected human (hek293) and non-human primate (cos7) cells (fig. 6) . comparison between the natural codon usage and third position gc levels with the optimized orfs indicate that there are some striking differences (fig. 7) . the three native orfs have mild codon bias however they are strikingly at rich in the 3rd position of the codon. in contrast the optimized versions of the same genes are now strongly biased and are extremely gc rich in the 3rd position of the codon. based on these results and our earlier work it may be possible to predict which pox genomes encode genes which would be resistant to transient expression, using common mammalian expression vectors in mammalian cells (table 3 ). basically those genomes that are at rich, including the entomopox-, the yatapox-, the orthopox-, capripox-, and suipoxviruses would be predicted to be resistant to transient expression in human and nonhuman primate cells. in contrast we would predict that genes from parapox-and molluscipoxviruses should be well expressed in transient mammalian systems. as well that might also explain why ymtv and tpv genes are so well expressed in the baculovirus expression system. the high proportion of at in the 3rd position is already adapted to the insect cells environment and may reflect an evolutionary history that involves replication within an insect host. examination of all poxvirus genomes and the proportion of gc content at each position of the codon indicate that all the genomes have a decreasing proportion gc at each successive position except for the leporipoxviruses, the parapoxviruses and the molluscipoxvirus (table 2 ). for the five species within these three genera the highest proportion of gc occurs in the 3rd position whereas all other species the highest gc content occurs in the first position. the relationship between genomic gc content and gc 3 indicates that all pox genomes except for mv, sfv, mcv, orf and bpsv contain an overall gc content between 18% and 30% with a smaller proportion of gc at the 3rd position ( table 2 ). in contrast the other five genomes have an overall gc content that ranges between 40% (mv, sfv) to 65% (orf, bpsv, mcv) and in each case the gc content of the 3rd position is even higher at about 50% for mv and above 80% gc for mcv, orf and bpsv (table 2) . codon usage bias in the poxviridae is related to gc content the total gc content of the poxviridae genomes range from 18% (amepv) to 65% (orf virus) gc (table 2) . however the gc 3 % ranges from 8% (msepv) to a staggering 90% (bspv). poxviruses contain very little non-coding dna within their genomes and since the first two codon positions are constrained by codon specificity requirements it would be predicted that the 3rd position would exhibit the most variation. we compared overall gc content to the gc content at each position of the codon (gc 1 , gc 2 gc 3 ) calculated from the complete coding complement. the assumption was that the third or synonymous position of the codon would be under less selection pressure because of the redundancy of the amino acid coding. however we found that the highest correlation was between overall gc content and gc 1 (r 2 = 0.98) and gc 3 (r 2 = 0.98) (fig. 8 ). all members of the poxviridae maintained this strong correlation. and this relationship indicates that codon usage is tightly linked to individual gc content. undetectable levels of transient gene expression of yatapoxvirus genes prompted us to examine codon usage in the family poxviridae. our results indicate that there are high-gc content (parapox-and molluscipoxviruses) and low-gc content (entomopoxviruses) poxviruses and it is those genomes with the largest gc extremes that exhibit the largest bias in codon usage. codon usage bias in poxviruses is skewed in the direction of the overall gc content. we found that optimizing for codon usage resulted in dramatic improvement of the expression signal of transiently fig. 7 comparison of natural gene codon usage versus optimized codon usage. the effective codon numbers for three native tpv genes are compared against the effective codon numbers of the same genes following codon optimization the lines represent best fit expressed genes. in the case described here, the native sequence of tanapox t2l, which is at rich and exhibits some codon bias was synthesized to increase the codon usage bias by increasing the gc concentration at the third codon position. the percent gc 3 in the native form of the gene was altered from less than 20% to greater than 80% gc in the codon optimized version (fig. 2) . because poxviruses replicate exclusively within the cytoplasm the viral transcripts have not evolved under nuclear splicing or processing selection and this may explain the wide variation in gc content within the family poxviridae. the 21 poxvirus members included in our study infect a wide range of hosts however they show similar trends between their genomic gc content and amino acid composition, and therefore the codon bias employed. even members whose life cycle is restricted to infection of a single species, such as variola virus and molluscum contagiosum, which only infect humans, maintain amino acid composition related to their own specific gc content. the observation that modification of the gc 3 in the optimized codons led to dramatic expression is not surprising because codon usage in other virus families is also related to gc content [12] . however poxviruses have two distinct features that make them unique. first they encode all the necessary transcription machinery within their virus factories in the cytoplasm and therefore do not rely on cellular components [13] . second, although members of the poxviridae infect a wide range of hosts including insects, birds, reptiles and mammals, with a few exceptions, individual poxvirus species have a narrow host range [11] . therefore we can expect that individual poxviruses have adapted to the molecular features of their particular host. the genomes are well conserved suggesting a common ancestor and they have been incredibly successful. this is likely due to the fact that they do not require residency within the nucleus but rather construct their own virus factories within the cytoplasm. there is a plasticity to the codon usage found in the poxviruses that does not necessarily reflect the common evolutionary history. we have examined three conserved poxvirus orthologs, which are predicted to function in a similar manner in members of the poxviridae including dna polymerase, p4a (the major core protein) and dna uracil glycosidase and which all pox members encode however there is variable codon usage between the pox species for the same genes. the biases appear related to genomic gc content. it has been suggested that codon bias reflects the level of gene expression and/or length of gene [14] however this does not seem to be supported in the poxviridae because the codon usage for the same orthologs are different depending on the poxviral member (fig. 5 ). it has also been suggested that codon bias could have evolved based on host requirements. however this does not hold for mcv, which has a gc rich genome (63.9% gc, table 2 ) and variola virus which is more at rich (33.4% gc, table 2 ) however both replicate exclusively in human tissues. perhaps the difference in gc concentration may be explained by the cell type or tissue in which the virus is resident. mcv is found exclusively in the keratinocytes of the dermis while variola virus can be found through out the body including in the lymphatic system, respiratory system and blood [10] . the incongruence between the %at of a poxvirus genome and the at concentration of its host genomic dna has been noted before [15] . capripoxand parapoxviruses both infect ungulates (sheep, goats, antelopes) however following the sequencing of 2.5 kb of capripoxvirus dna it was noted that the high at concentration (72.4%) of the capripoxvirus dna did not reflect the at concentration of the evolutionary hosts [15] . selected analyses have suggested that sheep and goats had at concentrations around 50%. as well parapoxviruses, which share the same host range have a viral genomic content of around 37% at [16] . complete sequence data now confirms these earlier estimates. the parapoxviruses genomes (orf and bspv) are 34.9% at rich, while the capripoxvirus member (lsdv) is 73.3% at rich (table 2) . unlike the situation in poxvirus genomes, analysis of the sars coronavirus and other members of the nidovirales indicated significant variation in codon usage bias among different genes within a species [17] . it was concluded that gc composition was the primary determinant of synonymous codon usage among these virus genes but the bias was manifested at the gene level rather than at the genome level [17] . a study on the codon usage in nucleopolyhedroviruses (npv), another family of large dsdna viruses, concluded that there was significant variation in codon usage by genes within the same virus. again this is different from what we are reporting for the poxviruses. however the npv study was based on six genes and we examined the complete complement of orfs. individual variation might be lost in the overall picture for the poxviruses. as well significant variation in codon usage in homologous genes encoded by different npvs was observed. this is similar to our observations with poxviruses. finally there was no correlation between level of gene expression and codon bias in npv or between gene length and codon bias, and patterns of codon usage appeared to be a direct function of gc content of the virus encoded genes [12] . this is consistent with our observations reported here. virus genes (2006) 33: 15-26 25 there are now examples from several virus families that indicate that alteration of the native codons will result in dramatically improved expression. in most cases the expression problem seems to be inappropriate codon usage. native human papillomavirus (hpv)-16 e5 utilized infrequently used codons in 33 of its 83 amino acids and was undetectable following transient transfection however once the sequence was optimized for more common codons, used in mammalian genes, expression increased 6 to 9-fold [2] . another hpv gene, l1, hampered by codon usage bias different from the host was corrected by codon optimization resulting in a 100-fold increase in expression levels [3] . in conclusion the members of the poxviridae have genomes with a wide range of gc content and this appears to regulate their codon usage bias. the codon bias does not seem to be related to the size of the genes or their expression level because the codon bias seems to be maintained within genomes but not between genera. optimizing codon usage has improved the transient expression of several pox genes in mammalian cells. based on the calculation of the effective codon number for all orfs from all complete genomes we would predict that the best species to study by transient expression of native genes should from the parapox-, mollusci-and leporipoxviruses genera. however those poxvirus members that are resistant to transient transfection and expression in human or non-human primate cells will likely benefit from codon optimization. baculovirus expression vectors: a laboratory manual analysis of codon usage fields virology fields virology proc. natl. acad. sci. usa acknowledgements gm is a canada research chair in molecular virology. this research was supported by cihr and ncic. we thank t. irvine for technical assistance and d. hall for administrative support. key: cord-265095-lf5j4ic7 authors: ten dam, edwin b.; pleij, cornelius w. a.; bosch, leendert title: rna pseudoknots: translational frameshifting and readthrough on viral rnas date: 1990 journal: virus genes doi: 10.1007/bf00678404 sha: doc_id: 265095 cord_uid: lf5j4ic7 ribosomal frameshifting on retroviral rnas has been proposed to be mediated by slippage of two adjacent trnas into the — 1 direction at a specific heptanucleotide sequence. here we report a computer-aided analysis of the structure around the established or putative frameshift sites in a number of retroviral, coronaviral, toroviral, and luteoviral rnas and two dsrna yeast viruses. in almost all cases a stable hairpin was predicted four to nine nucleotides downstream of the shifty heptanucleotide. more than half of the resulting hairpin loops give rise to potential pseudoknotting with sequences downstream of this hairpin. especially in the case of the shifty heptanucleotides u uua aac and g gga aac, stable downstream pseudoknots are present. indications were also found for the presence of pseudoknots downstream of amber stop condons at readthrough sites in some retroviral rnas. translational frameshifting, although generally an abortive event during protein synthesis, is employed by various retroviruses to express the pol gene, encoding the reverse transcriptase and integrase. frameshifting occurs at a defined site in the overlap region of the gag and pol genes and results in the synthesis of a gagpol fusion protein (1, 2) . in some retroviral rnas, e.g., mouse mammary tumor virus (mmtv) rna (3, 4) , a double-frameshift event takes place, leading to the expression of a third reading frame encoding a protease. the ribosome is generally shifted into the -1 reading frame, but a shift into the + 1 frame has been noted in one case for the related retroviral-like transposon ty-1 (5, 6) . studying the se-ten dam, pleij, and bosch quence requirements for ribosomal frameshifting during translation of rous sarcoma virus (rsv) rna, jacks and coauthors found indications for a mechanism in which simultaneous slippage occurs of two adjacent ribosome-bound trnas by one nucleotide in the 5' direction at the site of frameshifting. comparison of the sequences at the known or suspected frameshift sites in reading frame overlaps of a number of retroviral rnas revealed a consensus heptanucleotide, consisting of a run of three a, u, or g residues followed by the tetranucleotide uuua, uuuu, or aaac (7) . interestingly, in the case of rsv rna, the presence of such a shifty heptanucleotide appeared to be insufficient; an additional 147 nucleotides downstream of the frameshift site were also necessary for efficient frameshifting. evidence was obtained that the additional 147 nucleotides in rsv rna harbor a stable stemloop structure. furthermore, deletion analysis revealed that, beside this stemloop structure, a downstream stretch of 20 nucleotides is essential (7) . the authors suggested that these nucleotides may be involved in the formation of a pseudoknot. however, for the human immunodeficiency virus (hiv-l), the stem-loop structure downstream of the frameshift site is dispensable, and efficient frameshifting can be mediated by a short sequence of 16 nucleotides around the frameshift site only. it was suggested that "retroviruses may divide in two broad classes, one using linear 'shifty' sequences (e.g., hiv) and the other using more elaborate mechanisms based on rna secondary structure (e.g., rsv)" (8) . in this context it seemed of interest to examine the nucleotide sequences harboring frameshift sites in various overlap regions in more detail and to search for possibly tertiary interactions. here we report the results of such a search performed with the computer. beside the stable stem regions just downstream of the suspected frameshift sites already proposed (4, 9, 10) , we also find strong indications for pseudoknotted structures downstream of the shifty heptanucleotide in more than half of the overlap regions examined, including those present in coronaviral, some plant viral rnas, and a yeast dsrna virus. during the course of this work, strong experimental evidence was reported for the presence of a pseudoknotted structure downstream of the frameshift site in infectious avian bronchitis virus (ibv) rna. this pseudoknot was shown to be essential for efficient frameshifting of the ribosome in the orfla/orflb overlap region (11) . these authors also proposed similar pseudoknots in a number of retroviral rnas, some of which are identical to the ones resulting from our analysis. an analysis of the region downstream of the site, where in some retroviral rnas efficient readthrough of an amber stop codon occurs, also revealed potential pseudoknotted structures. the overlap regions containing established or putative frameshift sites as tabulated (7) and 17 that have not been discussed before were folded in secondary structures using a program developed in our laboratory by abrahams et al. (manuscript submitted for publication). this program is able to predict pseudoknotted structures involving hairpin loops, also coined h-type pseudoknots (12) , and has been successfully applied for the prediction of a number of consecutive pseudoknots in the 5' noncoding region of foot-and-mouth disease virus (fmdv) rna (13) and of pseudoknots in various other viral rnas (pleij, unpublished observations) . stretches of about 250 nucleotides surrounding the shifty heptanucleotide were analyzed. because only stem-loop structures downstream of the shifty sequence appear to be important (7,l l), we have focused mainly on the sequence at the 3' side of the frameshift site. some pseudoknots involving bulge loops or multibranched loops are not predicted by this program. we, therefore, searched for these structural elements by visual inspection of the sequences if the proper stem-loop structures were found, taking into account the rules that are imposed by the geometry of the rna-a double helix. the characteristics of rna pseudoknots and their prediction and detection have been reviewed (12, 14) . to simplify the description of the various structural elements around the frameshift region we introduce the terminology given in fig. 1a . essential fea-tures are the heptanucleotide sequence sh, where the frameshift takes place, and the stem region sl separated from sh by the spacer sp. if pseudoknotting involves a simple hairpin loop, as depicted in fig. la , it is fully defined by the connecting loops ll and l2 and the other stem region or "tertiary interaction" s2. except for sh and sp, the symbols are derived from the nomenclature previously used to describe pseudoknots (12, 14) . figure 1b shows a schematic presentation of the structure obtained after coaxial stacking of stem segments sl and s2. a characteristic feature of the relatively simple pseudoknot illustrated in fig. ia is that the hairpin loop sequence participating in the tertiary interaction borders directly on the stem region of the hairpin. we call this type of pseudoknot h. when examining the retroviral structures we have not restricted ourself to a search for this particular type only, but have included pseudoknots that meet the more general definition: a structural rna element formed upon basepairing of nucleotides within a loop with nucleotides outside that loop (12) . an example of a more complicated pseudoknot is the one proposed for rsv rna (see below). potential pseudoknots downstream of frameshif sites in retroviral rnas table 1 presents the results of a computer-aided examination of 38 overlap regions harboring established or putative frameshift sites (7) . we have included a number of sites from the retro-, luteo-, corona-, and toroviral groups and two yeast viruses not discussed before. for nearly all sequences tested, the computer program predicted a very stable hairpin, starting four to nine nucleotides downstream of sh, in agreement with observations by others for rsv, mmtv, hiv-l, hiv-2, and simian immunodeficiency virus (siv mac) (4, 9) . similar results were reported using a different rna secondary structure-predicting program (11) . this particular stem sl was the most stable stem present in the 250 nucleotides surrounding sh, except for the luteovirus barley yellow dwarf virus (bydv) (15) and the transposable element gypsy (16) . in the latter two cases, it was the second best. in mouse intracisternal a particle (mouse iap), such a hairpin is found if a g-a mismatch in s 1 is allowed (17) . s 1 was neither predicted by the program nor found by eye for the transposable element 17.6 (11, 18) and for the retrovirus siva~~ (10) . the latter has u uuu uua as the shifty sequence, and the absence of this hairpin is not surprising in view of the experimental results obtained for hiv-l (8) (see discussion). for hiv-2 and siv mac we have included for sl the stems as proposed earlier (9) . interestingly, pseudoknotted structures were predicted directly by the program. a typical result obtained for the gag-pro overlap of saids retrovirus-serotype 1 (srv-1) rna (20) is shown in fig. 2a . the pseudoknot predicted here is of the h type (see terminology), which is frequently observed in the noncoding regions of a number of other viral rnas (21, 22) . the size of the connecting loops ll and l2 meets the steric demands that result from stacking the two consecutive doublehelical segments and forming a quasi-continuous helix. accordingly, the single a 6 2 5 1 7 1 7 1 7 7 3 2 8 10 10 6 8 6 7 1 11 2 10 6 4 2 5 4 4 2 bfor definition see terminology section and fig. 1 . sp, hl, ll, and l2 are given in number of nucleotides; sl and s2 in number of base pairs. 'secondary structure as proposed (9) . dpresence of substructure in hl, ll, or l2, respectively. residue is sufficient for crossing the deep groove of this helix over 6 bp, comparable to the pseudoknotting in the leader of the gene 32 mrna of bacteriophage t4 (22) . folding of the corresponding sequence in the gag-pro overlap of the closely related srv-2 and mason-pfizer monkey virus (mpmv) rna (23,24) yields a fully identical pseudoknot (not shown), which may reflect its functional importance. the sequence conservation in all three viral rnas is absolute, however, which means that covariations in the stem regions, which generally provide support for the proposed structures, are lacking here. the gag@ overlap region of feline immunodeficiency virus (fiv) contains a similar structure (25) . inspection of the sequence in the corresponding overlap region in another type d retrovirus, smrv-h (26), suggests a sh, as indicated in fig. 2b . the sh is located at the same position as can be concluded unambiguously from a sequence alignment. its sequence is identical to that of the three other type d retroviral gagpro shs mentioned above. the length of its sp is seven nucleotides, and its hairpin shows a strong resemblance to that of the other three related retroviruses (e.g., srv-1 fig. 2a) . however, the formation of a h-type pseudoknot is not possible anymore due to the g insertion in hl and a u-to-c substitution in the complementary sequence downstream of the hairpin. surprisingly, sl now harbors a potential second sh, g ggc ccc, which in turn is separated by a sp of four nucleotides from the ideal h-type pseudoknot predicted by the program (fig. 2b ). this possible second frameshift site might compensate for the loss of the pseudoknot after the first site. whether g ggc ccc can function as a sh sequence obviously remains to be seen, let alone that a second frameshift site indeed is active in smrv-h rna. a similar situation was found in both siv rnas (see below). the formation of a h-type pseudoknot is also possible in equine infectious anemia virus [eiav (27) , not shown], and another example is provided by human t-cell leukemia virus (htlv-i) rna in the pro-p01 overlap (28) . in this case, both sl and s2 are exceptionally long. a stretch of ten nucleotides from the 16membered loop is complementary to a sequence 20 nucleotides downstream of s 1. the related simian t-cell leukemia virus (stlv-i) has a g-u pair in s2 substituted for an a-u pair (29) . much more substitutions are present in htlv-ii (30) . sl appears to be completely conserved, but various substitutions are found in ll and l2. the three substitutions at the 3' side of hl give rise to two mismatches, thereby shortening s2 and possibly interrupting the coaxial stacking on sl. similar deviations from what might be called the ideal h-type pseudoknot are encountered with a number of other retroviral rnas, such as mmtv (gag-pro), bovine leukemia virus (blv) (pro-pal) (31, 32) , and visna virus (visna) (33) rna (not illustrated; see table 1 and ref. 11). a number of hairpins downstream of sh have a hl consisting of less than six nucleotides and in fact are not suitable for pseudoknotting [see hiv-i rna [8, 34, 35] , htlv-i (gag-pro) and htlv-ii (gag-pro)]. the hairpin in the transposable element gypsy contains an eight-membered hl, but no possible pseudoknotting could be detected. non-h type pseudoknots can be more difficult to identify. an example is the one proposed for rsv rna (36) . the program predicted the same secondary structure downstream of the frameshift site as proposed (7). we assume, however, that the bottom part of sl does not play a role in the frameshifting event for reasons outlined in the discussion, visual inspection of the resulting hairpin revealed the potential pseudoknotting, as already described (11) . in our view, this tertiary interaction can even be extended from 8 to 11 bp upon accepting the formation of a bulged u residue (not shown). the relative complexity of the structure in rsv rna is apparently not restricted to this rna, but is found in a number of other retroviral rnas with a similar hl of 30-60 nucleotides, often involved in internal hairpin formation themselves. since the computer program is unable to predict pseudoknots harboring such multibranched loops, hairpins have to be inspected by hand, in doing so, no potential pseudoknots could be detected in the case of mpmv (pro-pal) or the related srv-1 (pro-pal). in the pro-pol overlap of mmtv, a stretch of eight nucleotides of the loop (agccugua) was found to be complementary to a region just downstream of the hairpin (uacaggcu). the significance of this complementarity is doubtful, however, because part of the stretch in the loop is already involved in a small hairpin (results not shown). the group of retroviral rnas having an sh consisting of the heptanucleotide u uuu uua (e.g., hiv-l) shows other complexities. long and stable hairpins with a small hl were proposed for hiv-2 and sivmac (9) . we note here that in both cases the sequence agcccc, occurring in hl, is complementary to the sequence ggggcu, seven and nine nucleotides downstream of the stem, respectively (37, 38) . the program, however, predicted alternative structures, probably due to the presence of a number of alternating g-and c-rich regions downstream of sh. the results obtained with sivmmac and siv *om are puzzling, since the structure prediction suggested in both viruses a potential second sh, located 32 nucleotides downstream of the u uuu uua sequence. in sivagm a second sh sequence, a aau uuu, is present in the gag gene reading frame, while in sivmmac a potential sh, u uuc ccc, is found at exactly the same position. both sh sequences are followed by a stem region after three nucleotides. the hairpin found downstream of u uuc ccc in sivmac rna is reminiscent of the one present in rsv rna and in some other retroviral rnas, and a potential pseudoknot interaction with the long, single-stranded region further downstream is possible (not illustrated). we note that the situation described here for both siv viruses is analogous to the one described for smrv-h rna in fig. 2b . for a member of another retrovirus subfamily, human spumaretrovirus (hsrv), surprisingly, no sh could be found in the overlap region, although the arrangement of its gag and pol genes suggests a -1 frameshift (39) . furthermore, no stable hairpins, let alone pseudoknots like the ones found in the other overlap regions, were predicted. coronaviruses are plus-stranded rna viruses having large single-stranded rna genomes with replication strategies different from retroviruses. however, it was recently shown for the coronavirus ibv that the overlap of the two open reading frames (orfla and orflb) of the putative polymerase gene contains the shifty sequence u uua aac, which is followed downstream by a pseudoknotted structure (7,10,11). site-directed mutagenesis clearly demonstrated that the pseudoknot is involved in the very efficient frameshifting (25-30%). the program predicts essentially the same hairpin s 1 as proposed by brierley and coauthors (1 l), but we propose a slightly different tertiary basepairing, which enables a better coaxial stacking of sl and s2, and is typical for an h-type pseudoknot (fig. 3) . this proposal is further supported by a comparison with the possible pseudoknot in the corresponding region of the related coronavirus mouse hepatitis virus (mhv) strain a-59 (40) . covariations in both sl and s2 already prove that the pseudoknot exists in both viral rnas. this is especially clear for s2, where three of these covariations are found, including the g-a pair in mhv. note that such a g-a pair also occurs in the otherwise perfect stem sl in ibv. moreover, the shortening of the mhv stem sl at the top is compensated by the formation of an extra base pair (bp) in stem s2. it is further remarkable that the loop of the mhv hairpin shows an insertion of the stop codon triplet uaa, just in phase with the upstream orfla coding region. this insertion extends stem s2 with another 2 or 3 bp (see fig. 3 ). the single-stranded ug(u) stretch left may be just sufficient to cross the deep groove over lo-11 bp (14) . we note that the 32-nucleotide-long connecting loop can be folded internally (not shown), but this does not interfere with the pseudoknotting itself. a similar structure is predicted for a member of the torovirus group, berne virus (bev) (40) . luteoviruses are plant viruses that have single-stranded plus-sense rna genomes (41) . recently, the complete nucleotide sequences of three members of this group have been determined (42) (43) (44) . the putative viral rna-dependent rna polymerase gene of bydv is expressed by a -1 translational frameshift in the rather short overlap of 13 nucleotides. it was proposed that the uuua just upstream of the stop codon signaled frameshifting, analogous to the phenomenon in some retroviruses and the coronavirus ibv (15) . a possible stem-loop structure starting three nucleotides downstream from the uuua sequence was also presented. we here propose that the sh is formed by the heptanucleotide g ggu uuu, followed after five nucleotides by this stem (15) . searching for possible pseudoknot formation by the is-membered hl left open several possibilities for alternative, reasonable stable, secondary structures. a definitive proposal for the structure, therefore, cannot be offered. more rewarding was the analysis of the sequence of beet western yellow virus (bwyv) rna (42) . a potential candidate for a sh sequence in the right frame in the corresponding overlap region was found: g gga aac at position 1553 to 1559. it is followed after five nucleotides by a short but stable hairpin, which can form an h-type pseudoknot (fig. 4) . the nucleotide sequence of the closely related potato leaf roll virus (plrv) rna also has a potential sh. its position [1662-1668 in plrvw*o (43) or 1768-1774 in plrv sear (44) ] is identical to that of bwyv rna, as can be concluded unambiguously from aligning both plrv sequences with that of bwyv. its composition, however, is rather different: u uua aau in plrvwag and u uua aau/c in plrvscor. moreover, sl in plrv is shortened by 1 bp, but even more striking is the substitution of the u residue in hl of bwyv for a c in the wageningen plrv rna sequence. this substitution weakens the pseudoknot structure, if existing at all. it is tempting to suggest that the pseudoknot requirement is relaxed because of the transition of sh from u uua aac to u uua aau (see discussion). yeast viruses l-a is a dsrna virus of saccharomyces cerevisiae. its nucleotide sequence revealed two open reading frames, orfl and orf2, overlapping by 130 bases. orf2 is in the -1 reading frame with respect to orfl. a possible sh, g ggu uua, is present in the overlap, followed after four nucleotides by a hairpin (45) . seven nucleotides of the eight-membered hl are complementary to a stretch of nucleotides that are 11 bases downstream of s 1, thus again forming a potential pseudoknot. another yeast dsrna virus, ll, has an identical structure in the overlap region (46) . some retroviruses express their pol reading frame by suppressing an amber stop codon separating the gag and pal genes (47, 48) . a glutamine is inserted at this site, as shown in two cases (49, 50) . this very efficient suppression is caused by an corresponding hairpins of akv, mo-mlv, and m7. encircled residues indicate base changes with respect to felv. in these viruses hl is shortened by one c residue. intrinsic c&acting component of the viral rna located within 300 nucleotides around the amber stop codon in ak virus (akv) (51) . a stable hairpin with the uag codon in the loop was proposed as a secondary structure element that could play a role in the readthrough event (4752). recently, studies using sitedirected mutagenesis around the gag-& junction indicated that this stem-loop structure is important for virus activity (53) . a similar hairpin is present in moloney murine leukemia virus (mo-mlv) (52) . however, a role of this particular hairpin in the readthrough phenomenon is difficult to reconcile with the position of the amber codon in the loop region (see also 51) . moreover, the hairpin appears not to be conserved in m7 baboon endogenous virus (m7) and feline leukemia virus (felv, results not shown), nor in spleen necrosis virus (51) . the computer program predicted other stable hairpins around the amber codon of felv and m7, which were not conserved in the other rnas either. however, there is a structure motif that is conserved among all four viruses. we here note that one of the most stable hairpins possible in the entire felv genome occurs just downstream of the amber stop codon (54) . this hairpin is capable of forming a pseudoknot. the loop contains a long stretch of only cs at its 3' side, which can form a very stable s2 with six g residues 18 nucleotides downstream of the hairpin (fig. 5a) . we emphasize the strong resemblance of this potential pseudoknot with some of those present in viral rnas showing translational frameshifting (see above). moreover, the distance from the uag stop codon to sl (eight nucleotides) reveals another striking resemblance (see also discussion). comparison of the felv sequence with those of three other retroviral rnas, having established or putative suppressed amber codons (47, 52, 55) , gives support to the proposed pseudoknot, though not in a decisive manner (fig. 5b) . note again the single a residue in ll of akv and m7, and also of mo-mlv, if c2255 indeed is a g residue, as was reported recently (53) . the nucleotide sequence of felv (54) in fact points to frameshifting as the mechanism of pof expression, because the gag and pol open reading frames are overlapping by five nucleotides, with pol in the + 1 reading frame with respect to ten dam, pleij, and bosch gag. however, no signal for frameshifting can be found around the overlap region. no similarity between the 1cnucleotide sequence involved in + 1 frameshifting in yeast ty elements (5, 6) is present upstream of the stop codon. also no putative sh can be found. removing 1 of the 10 consecutive c residues downstream of the amber codon enables a better alignment with the mo-mlv sequence and would give this region of the felv genome an organization similar to that of the other three type-c retroviruses (49, 50) . this, and the presence of the pseudoknot in all four retroviruses discussed above, suggest that the felv expresses its pal gene by a readthrough mechanism. in this paper we have presented a computer-aided examination of the secondary structure and the potential pseudoknotting of the rna region downstream of putative or established ribosomal frameshift sites of various viral rnas. this search was inspired by the suggestion that a pseudoknot is involved in the frameshift event in the gag-pol overlap of rsv rna (7) . our data, which include viral sequences not tabulated before, indicate that 26 of the 38 overlap regions studied here harbor potential pseudoknots. these pseudoknots are always found four to seven nucleotides downstream of a heptanucleotide sequence, where the translational frameshifting was demonstrated or supposed to take place. there are only three exceptions-eiav, fiv, and sivmiac-where sp is nine, eight, and three nucleotides long, respectively. some of the pseudoknotted structures found were of the same type as described previously for noncoding regions of viral rnas, in which the stretch of nucleotides from hl basepairing with a complementary region outside this hairpin borders immediately on sl, enabling coaxial stacking of the two stem segments (pseudoknots of the h type, compare fig. 1 ). it is noteworthy that a stable hairpin downstream of the sh sequence was predicted for all overlap regions included in table 1 , except for the retrotransposon 17.6 and sivag~. the failure to find a hairpin or pseudoknot in the overlap of the latter is consistent with the finding that the stable hairpin in hiv-i rna is dispensable for frameshifting (8) . these authors proposed that two broad classes of retroviral rnas exist, differing in their mechanism of frameshifting: one class using a short linear shifty sequence (like in hiv-i) and the other using rna secondary structure for efficient frameshifting (e.g., rsv and ibv). in principle, the second class may be divided in two subclasses: one harboring a hairpin, the other a pseudoknot. our results suggest that a substantial, if not a major, part of the viral rnas listed in table 1 use a pseudoknotted structure for optimal shifting. a similar conclusion was recently presented by brierley and coauthors (1 l), who reported that 14 out of 22 sequences examined appeared to contain the potential for pseudoknot formation. these authors also provided strong experimental evidence that in ibv the pseudoknotted structure indeed is necessary for efficient frameshifting. it will be interesting to know if the same holds true for the majority of the viral rnas analyzed here. the question is, of course, what is it that makes a pseudoknot so suitable for inducing efficient frameshifting? we assume that it is not merely for formation of a structure more stable than a hairpin alone, because we were unable to find a correlation between the calculated stability of stem sl (56) and the presence of a potential pseudoknot (results not shown). the number of base pairs in sl was not found to be critical either. two structural features distinguish an rna pseudoknot from a classical rna hairpin: the two connecting loops ll and l2, of which the bases point into the deep groove and away from the shallow groove, respectively (57) , and the quasi-continuity of the double helix. which of these features induces the ribosome to shift into another reading frame remains to be established, however. another factor contributing to the extent of frameshifting could be the length of the spacer region, which varied between four and seven nucleotides. for ibv, changing sp from six to three or nine nucleotides, respectively, reduced or abolished frameshifting (11) . spacing between sh and the structure involved in frameshifting thus appears to be critical. in this respect it is striking that the distance between the amber stop codon and the pseudoknot at readthrough sites is almost equal to sp in frameshifting. the stem sl as originally proposed for rsv rna is in fact an exception, in that it starts immediately downstream of sh, forming a 14-bp stem, including a bulged c residue (7) . we have chosen to disrupt 5 bp up to the bulged c residue, which leaves an sp of six nucleotides. the latter value is in the range of that of all the viral rnas (see table 1 ). moreover, mutations in this six-nucleotide stretch did not alter the frameshift efficiency (7), which argues against the importance of the bottom part of the stem proposed. no correlation between the sp size and the presence of a pseudoknot became apparent in the present comparison, however. a comparison of the sequences of the sh heptanucleotides is more suggestive (see table 1 ). two sequences stand out in overlaps harboring a pseudoknot: g gga aac and u uua aac. it is tempting to suggest, therefore, that the sequence aaac, where the trna bound to the ribosomal a site is shifting, has to be followed by an elaborate rna structure. the first three nucleotides of the sh heptanucleotide, however, play an additional role, as can be concluded from the group with the a aaa aac sequence, which has some members for which potential pseudoknotting could not be established. an important factor to consider further may be the presence of c and g residues in the sh heptanucleotide, leading to more stable codon-anticodon interactions. in this case a longer stalling of the ribosome may be needed to increase the chance of the slippage event. such a longer stalling may be achieved by an extra structural feature downstream of sh. in this respect, it is interesting to see that pseudoknotted structures may be involved as well in the efficient readthrough of an amber codon. it is conceivable that a common basis for both mechanisms is the need for a stalling of the translating ribosome, which is pro-vided by pseudoknotted structures for reasons we do not yet know. if such a common basis is present, one can predict that a few changes in the nucleotide sequence around sh or the amber codon, respectively, could easily change a frameshifting viral rna into one suppressing an amber codon and vice versa. however, first more information is needed about the actual requirement for amber stop-codon suppression of a downstream stem-loop structure or pseudoknot. the same holds true for a large number of viral rnas having overlapping reading frames in which frameshifting occurs, despite the presently available data on rsv, hiv-l, and ibv. proc nat1 acad sci positive strand viruses. ucla symposia on molecular and cellular biology, new series. alan r. liss proc nat1 acad sci proc nat1 acad sci proc nat1 acad sci the control of hiv gene expression nucleic acid domains and proteins involved in the replication of coronaviruses the plant viruses proc nat1 acad sci we would like to thank marianne huisman, mike mayo, wayne gerlach, peter bredenbeek, and willy spaan for communicating data prior to publication; jan pieter abrahams for making available the computer program; and jan van duin for stimulating discussions and reading the manuscript. key: cord-287748-co9j3uig authors: kobayashi, tomoya; murakami, shin; yamamoto, terumasa; mineshita, ko; sakuyama, muneki; sasaki, reiko; maeda, ken; horimoto, taisuke title: detection of bat hepatitis e virus rna in microbats in japan date: 2018-05-29 journal: virus genes doi: 10.1007/s11262-018-1577-9 sha: doc_id: 287748 cord_uid: co9j3uig several recent studies have reported that various bat species harbor bat hepatitis e viruses (bathev) belonging to the family hepeviridae, which also contains human hepatitis e virus (hev). the distribution and ecology of bathev are not well known. here, we collected and screened 81 bat fecal samples from nine bat species in japan to detect bathev rna by rt-pcr using hev-specific primers, and detected three positive samples. sequence and phylogenetic analyses indicated that these three viruses were bathevs belonging to genus orthohepevirus d like other bathev strains reported earlier in various countries. these data support the first detection of bathevs in japanese microbats, indicating their wide geographical distribution among multiple bat species. bats are known to be natural reservoirs of various zoonotic viruses such as rabies virus, nipah virus, and severe acute respiratory syndrome (sars) coronavirus [1] [2] [3] . in addition, hepatitis e virus (hev)-like viruses have been detected in bats in several countries [4, 5] . bat hev (bathev) belongs to family hepeviridae, genus orthohepevirus and is a non-enveloped, positive-sense, single-stranded rna virus. orthohepevirus is divided into four species, orthohepevirus a-d [6, 7] . orthohepevirus a contains hev, which is a causative agent for human acute hepatitis. orthohepevirus b includes avian-hev associated with hepatitis-splenomegaly syndrome. orthohepevirus c is divided into c1 (rodent-hev) and c2 (carnivore-associated hev) genotypes. orthohepevirus d comprises bathev, which has 57.4-64.8% identity with hev [8] . recently, bathevs were detected from macrobats and microbats in a variety of countries [4, 5] . however, limited information about the distribution of bathevs in other regions and their ecology is available. here, we surveyed several japanese microbat species to detect bathevs. to examine whether bathevs exist in japanese bats, we collected 81 bat fecal samples from nine bat species captured in five different prefectures of japan in 2015, with permission from the ministry of the environment, japan and the respective local government (fig. 1a ). bats were caught using a harp trap and kept in a pouch for an hour to check the signs of disease as well as to obtain fresh feces. all captured bats did not show any obvious symptoms. the feces were added in a medium containing antibiotics, and frozen in dry ice. we extracted rna from the fecal samples and performed rt-pcr using a primer set ( table 1) that was designed specifically against the conserved region of the rna-dependent rna polymerase (rdrp) of hev, to screen the hev genomes. two samples (bthev-ej1 and bthev-ej2) from the japanese short-tailed bat (eptesicus japonensis) and 1 sample (bthev-ps1) from the brown long-eared bat (plecotus sacrimontis) were found to be positive. although we attempted to sequence the entire genome of the bathevs, we failed to amplify the whole genome using rt-pcr. therefore, we amplified partial orf1, which encodes non-structural protein including rdrp, and entire capsid coding region of bthev-ej1, -ej2, and -ps1 (corresponding to nucleotides (nt) 2171-6690, nt 3585-6690, and nt 3847-6690 of bathev/bs7, respectively) using specific primers ( table 1 ). the sequences of bthev-ej1 and bthev-ej2, which were collected in the same site on different days, were 99% identical (nt 2904 out of 2915). the identity between bthev-ej1/-ej2 and bthev-ps1 using a part of the rdrp was about 75% at the nucleotide level, suggesting the presence of multiple bathevs in japan. blast analysis indicated that bthev-ej1/-ej2 showed the highest sequence identities to bathev/bs7, a german strain detected from the serotine bat (eptesicus serotinus), among strains previously reported in other countries. on the other hand, bthev-ps1 showed the highest sequence identity to bthev/nms098b, a german strain detected from the daubenton's bat (myotis daubentonii). in particular, 82% identity observed between bthev-ej1/-ej2 and bathev/bs7 or 77% identity observed between bthev-ps1 and bthev/nms098b were greater than that observed between japanese strains. these data suggest that similar viruses exist in geographically distant regions. we then phylogenetically analyzed the sequences by maximum-likelihood analysis using clustalw and mega version 7.0 [9] . a phylogenetic tree constructed using the partial amino acid sequences of rdrp indicated that the japanese viruses were included in orthohepevirus d (fig. 1b) , demonstrating that all bathevs are classified in this species. we also amplified the full-length capsid (orf2) sequences by rt-pcr and analyzed them phylogenetically. the resulting tree confirmed that the novel japanese viruses were included in orthohepevirus d (fig. 1c) . the sequences of bthev-ej1/-ej2 were placed in a position neighboring the german bathev/bs7 strain, confirming the phylogenetic similarity between these strains. all bats captured in this study were insectivores and hibernate in winter. although there is no information about migration of e. japonensis and p. sacrimontis, bat species closely related to them were reported to migrate only a few kilometers from their colonies per night [10, 11] . they had different ecology in the terms of the habitat. bthev-ej1 and -ej2 were detected on different days from two e. japonensis bats, both of which used eaves of the same house as night roost in nagano. although the e. japonensis bats formed mixed colony at the roost with myotis ikkonikovi and rhinolophus ferrumequinum bats, bathevs were only detected in e. japonensis, implying bthev-ej1/-ej2 might have a narrow host range. p. sacrimontis bats usually form a small colony without other species of bats. indeed, p. sacrimontis bats, from which bthev-ps1 was detected in this study, were captured near such small colony in a ruin in aomori. thus, bthev-ps1 is likely circulating in the p. sacrimontis since the bats have low opportunity to come in contact with other species of bats. the closely related bathevs (bthev-ej1/-ej2 and bat hev/bs7) have been detected in different species of eptesicus bats (e. serotinus and e. japonensis). since the distribution areas of these bats are not overlapping, viruses ancestral to bthev-ej1/-ej2 and bathev/bs7 might have infected ancestral eptesicus and might have branched into different species in the process of evolution. for virus isolation, we inoculated the rt-pcr-positive fecal samples not only into several bat cells (bkt, fbkt, and demkt1 cells) but also into other mammalian cell lines (madin-darby canine kidney (mdck), african green monkey veroe6, human a549, madin-darby bovine kidney (mdbk), and swine pk15 cells) since we suspected that the bat fecal samples may contain several pathogens other than bathevs. all inoculated cells were incubated for 12-15 days with media changes at 2-3 days interval. after the incubation, cells were blindly passaged three times. however, we could neither recover any infectious viruses nor detect bathev rna in the inoculated cells by rt-pcr. in conclusion, the present study showed the presence of several bathev strains, which were independently classified to obtain fresh feces, bats were kept in a pouch for an hour, and the feces were then collected by a sterilized cotton bud and transferred to 1 ml of dulbecco's modified medium eagle's minimum essential medium (dmem) supplemented with 100 u/ml of penicillin, 1 mg/ml of streptomycin, 100 µg/ml of gentamycin, and 2 µg/ml of amphotericin. the feces were suspended well and then centrifuged at 10,000×g for 15 min at 4 °c. the supernatants were used for rna extraction with isogen ls reagent (nippon gene). cdna was synthesized using prime script rt reagent kit (takara bio) with a mixture of random hexamer and oligo dt primers. pcr amplifications were performed using the kod fx neo (toyobo) with consensus hev primer sets (panhev f and r), which were designed in this study to amplify a 191-bp fragment of the rna-dependent into orthohepevirus d, in japanese bats, suggesting wide geographical distribution of bathev among multiple bat species. although these data suggest limited transmissibility of bathev to other animals, further studies are needed to determine its zoonotic potential. fields virology members of the international committee on the taxonomy of viruses hepeviridae study group acknowledgements we thank mr. mitsuru mukohyama for helping us key: cord-284675-7zv449sc authors: yew, tan do; bejo, mohd hair; ideris, aini; omar, abdul rahman; meng, goh yong title: base usage and dinucleotide frequency of infectious bursal disease virus date: 2004 journal: virus genes doi: 10.1023/b:viru.0000012262.89898.c7 sha: doc_id: 284675 cord_uid: 7zv449sc base usage and dinucleotide frequency have been extensively studied in many eukaryotic organisms and bacteria, but not for viruses. in this paper, a comprehensive analysis of these aspects for infectious bursal disease virus (ibdv) was presented. the analysis of base usage indicated that all of the ibdv genes possess equivalent overall nucleotide distributions. however when the base usage at each codon positions was analysed by using cluster analysis, the vp5 open reading frame (orf) formed a different cluster isolated from the other genes. the unusual base usage of vp5 orf may indicate that the gene was originated by the virus “overprinting strategy”, a strategy in which virus may create novel gene by utilizing the unused reading frames of its existing genes. meanwhile, the gc content of the ibdv genes and the chicken's coding sequences was comparable; suggesting the virus imitation of the host to increase its translational efficiency. the analysis of dinucleotide frequency indicated that ibdv genome had dinucleotide bias: the frequencies of cpg and tpa were lower and the tpg was higher than the expected. classical methylation pathway, a process where cpg converted to tpg, may explain the significant correlation between the cpg deficiency and tpg abundance. “principal component analysis of the dinucleotide frequencies” (df-pca) was used to analyse the overall dinucleotide frequencies of ibdv genome. df-pca on the hypervariable region and polyprotein (vpx-vp4-vp3) gene showed that the very virulent ibdv (vvibdv) was segregated from other strains; which meant vvibdv had a unique dinucleotide pattern. in summary, the study of base usage and dinucleotide frequency had unravelled many overlooked genomic properties of the virus. infectious bursal disease (ibd) is an immunosuppressive disease that affects young chickens characterized by the destruction of bursa of fabricius. reviews of the disease have been published elsewhere [1] [2] [3] [4] [5] . ibd is caused by infectious bursal disease virus (ibdv), which is a double-stranded rna (dsrna) virus [6, 7] . ibdv belongs to the genus avibirnavirus [8] under the birnaviridae family. other genera of birnaviridae are aquabirnavirus and entomobirnavirus [8] . ibdv genome consists of two segments, designated as segment a and b [6, 7] . the genome is enclosed within a nonenveloped icosahedral capsid approximately 60 nm in diameter [9] . the complete nucleotide sequence of segment a is 3,261 bp [10] that contains two open reading frames (orfs) of 3,036 bp [11] and 438 bp respectively, in which the smaller orf partially overlaps at the 5¢ end [12] . the large orf encodes a precursor polyprotein (nh 2 -vpx-vp4-vp3-cooh), which is autoproteolytically processed by cis-acting viral protease vp4 into vpx (48 kda), vp3 (32 kda), and vp4 (24 kda) [13] . vpx, as a precursor protein, will undergo a second independent proteolytic processing step to yield a smaller matured product known as vp2 [14] . vp2 and vp3 form the viral capsid [15] . high conformational epitopes present in vp2 protein are responsible for the production of neutralizing antibody to protect the chicken from ibdv infection [16, 17] . vp3 is the minor structural protein recognized by the nonneutralizing antibodies [18, 19] and can efficiently bind to ssrna and dsrna [20] . the small orf in segment a encodes vp5 protein with unknown function [21] . vp5 might be important in the pathogenesis [22] but is unessential for the viral replication and infection [22, 23] . vp5 might also be involved in the release of viral progeny from infected cells [24] . vp5 gene overlaps vpx gene at its 35th nucleotide, therefore almost all of its nucleotides are within the vpx. segment b (2,827 bp [10] ) consists of a single orf that encodes for vp1 (90 kda), a rna-dependent rna polymerase [25] [26] [27] with capping activities [28] . it has been reported that birnaviruses' polymerases formed a defined subgroup of polymerase by the lacking a gdd motif [29] . the formation of vp1-vp3 complexes plays a critical role in ibdv replication [30] . there are two serotypes of ibdv, namely serotype 1 and 2 [31, 32] . in addition to serological classification, ibdv strains are also grouped according to their virulence (mortality and bursal lesions) [5] . the very virulent ibdv strain (vvibdv) can cause up to 100% mortality and severe bursal lesions in specific-pathogen-free (spf) chickens [33, 34] . the classical virulent strain (cvibdv) may cause bursal damage and mortality up to 30% [35] . chickens infected by the variant strain (vaibdv) may rapidly develop bursal atrophy without the inflammation phase [36] but the mortality caused by the vaibdv can be less than 5% [5, 37] . attenuated strain (atibdv) is usually derived from the attenuation of cvibdv isolate and typically used as a vaccine; however, despite being attenuated, it may still capable of causing lesions in the bursa [37] . the newly emerged a typical (ayibdv) strain that has unusual amino acid substitutions in the vp2 gene is also being documented [38] [39] [40] . meanwhile, the serotype 2 isolates are usually isolated from turkeys and are apathogenic to both chickens and turkeys [18] . ibdv has also being classified based on its sequence characteristics such as the presence of certain restriction enzyme sites and unique amino acid residues in its vp2 gene [40] [41] [42] . the diversity of the ibdv strains had complicated the control and prevention of ibd, for example birds vaccinated against cvibdv strain may not have adequate protection against other strains [43, 44] . therefore, analysis of the common genomic properties of the various ibdv strains will contribute greatly towards the understanding of the virus and the subsequent control and prevention efforts. although many sequence analyses papers had been published, base usage and dinucleotide frequency of ibdv remained unexplored. by studying the base usage, it was found that the genomic gc content of flaviviruses was associated with its vector specificity [45] . in thermophilic bacteria, high genomic gc content had been associated with the greater genomic stability (stronger bond of g-c pairs compared with a-t pairs) as a result of evolutionary adaptation to the hot environment [46] . and for human immunodeficiency virus (hiv) and other lentiviruses, unknown mechanisms had driven these viruses in having a strong bias for adenine nucleotide [47] . non-random dinucleotide biases of the genome constitute a ''general design'' or genomic signature [48] [49] [50] [51] . genomic signature reflects the dna properties in terms of its stacking energies, modification, replication, and repair mechanisms [51] . moreover, genomic signature is useful for the detection of pathogenicity islands in bacterial genomes [51] . generally, cpg (or 5¢-cg -3¢) and tpa dinucleotides are scarce [52] [53] [54] . cpg deficiency is typically associated with the classical methylation pathway, in which susceptible cpg dinucleotides will be methylated and subsequently converted to tpg [55] . tpa dinucleotides are unfavourable because the ua in mrna is susceptible to rnase activity [56] . furthermore, avoiding tpa dinucleotides might reduce the occurrence of stop codons since two out of three stop codons are coded by taa and tag. this paper had unveiled several fundamental characteristics of the ibdv genome. the base usage at each codon positions was described. the extracted information from the base usage was then utilized to investigate the origin of the overlapping vp5 gene. comparison of the viral gc content with that of the host gave an insight into the virus-host interaction. the viral dinucleotide frequencies and their significance were also discussed. all ibdv sequences (433 sequences), except the patented sequences, were downloaded from the genbank release 131.0. duplicated sequences, non-coding sequences, and sequences with unresolved/ambiguous sites were discarded. sequences were then grouped into eight groups in reference to the different regions of the ibdv genome -namely vp1 (n ¼ 25), vpx (36), vp2 (40), vp3 (34), vp4 (35) , vp5 (28), polyprotein gene (33) , and hypervariable region (hvr) (130) groups. other sequences that cannot fit into the groups were excluded from the analysis. selected sequences were edited and aligned by using bioedit software version 5.0.9 [57] and clustalx software [58] . since most of the genbank's ibdv entries did not clearly state that which strain (pathotype) the isolates belonged to, rather than merely based on molecular markers, strain identification was done manually by extensive literature search. among the ibdv sequences in the genbank, only few of the isolates had been completely sequenced; whereas the majority others were not. since the grouping of the sequences was based on different regions of the ibdv genome, and since a fully sequenced isolate will cover all of the regions of the genome, then an isolate might simultaneously being included into different groups. meanwhile for most isolates, only their hvrs were sequenced and therefore they only formed part of the ''hvr group''. there was 1 serotype 2 isolate in vp5 dataset whereas 2 isolates in all other datasets. in summary, regardless of the groupings, the nucleotide sequences of 131 ibdv isolates were analyzed. the accession numbers of these isolates were: ab024076, af006694, af006695, af006696, af006697, af006698, af006699, af051837, af051838, af051839, af076223, af076224, af076225, af076226, af076227, af076228, af076229, af076230, af076231, af076232, af076233, af076234, af076235, af076236, af083094, af091097, af091098, af091099, af109154, af121256, af133904, af140705, af155123, af159207, af159208, af159209, af159210, af159211, af159212, af159214, af159215, af159216, af159217, af159218, af165149, af165150, af165151, af194428, af240686, af247006, af260317, af262030, af279287, af279288, af281651, af303219, af321054, af321055, af321056, af362747, af362771, af362773, af362776, af413069, af413070, af413071, af413072, af413073, af413074, af413075, af413076, af416620, af416622, af416623, af416624, af416625, af416626, af427103, af454945, af464901, af498628, af498629, af498631, af498632, af498633, af527039, aj001941, aj001942, aj001943, aj001944, aj001945, aj001948, aj238647, aj245885, aj245886, aj249517, aj249519, aj249520, aj249523, aj249524, aj277801, aj310185, ay029166, ay115569, ay115570, ay134874, d00499, d00867, d00868, d00869, d49706, d83985, l42284, m64285, m66722, m97346, x03993, x54858, x84034, x89570, x92760, x95883, y14955, y14956, y14957, y14962, y14963, y18612, y18682, z25481 and z25482. hosts (chicken and turkey) genomic coding sequences were obtained from codon usage database (http://www.kazusa.or.jp/codon/) genbank release 129.0. bursal est database [59] was referred to identify the highly expressed genes specifically found in the b-cells of the bursa of fabricius. since the database was constructed using a non-normalized cdna library, the most frequently identified chicken (callus gallus) genes will be the most abundantly (or highly) expressed genes in the bursa [59] . in addition, highly expressed bursal genes from other sources [60] were also included. therefore the 28 highly expressed base usage and dinucleotide frequency genes used in the analysis were ribosomal (16 sequences), heat shock (two sequences), elongation factor 1-a, b-actin, ig rearranged light-chain vjc, chicken germ line ig light chain, dead-box rna helicase, non-histone chromosomal protein hmg-17, mhc b complex, atf4, bu-la, and chbl genes. all of the sequences were downloaded from genbank and being meticulously edited, intronexcised, and analysed for the gc content. base usage and dinucleotide frequencies were calculated by using codonw 1.3 (software by john penden and available at ftp://molbiol.ox.ac.uk/ win95.codonw.zip) and dambe version 4.0.98 (by xuhua xia and available at http://web.hku.hk/ $xxia/software/installation.htm). both programs were used concurrently to ensure high reproducibility. data editing and various analyses (correlation, cluster analysis, and principal component analysis) were done by using microsoft excel 2002, statistica version 6, and spss version 11 software. the overall base usage was calculated for each virus gene. in addition, base usage at the first (p1), second (p2), and third codon positions (p3) were also computed. similarly, dinucleotide frequency was calculated for each of the reading frames (1:2, 2:3, 3:1) and as the overall measurement (at all codon positions). dinucleotide index (dni) was computed as the ratio of observed (o d ) to expected (e d ) dinucleotide frequencies: the expected frequency (e d ) of the dinucleotides at sites p1and p2 was calculated as where p(n 1 ) and p(n 2 ) were the proportions of the nucleotides n 1 and n 2 at p1 and p2 respectively. if there was no dinucleotide bias, dni value will be 1. base usage of serotype 1 ibdv genes base usage or the relative distribution of each nucleotide (a, t, g, and c) at each codon positions was calculated for vp1, vpx, vp2, vp3, vp4, and vp5 genes. subsequently, a rank of 1 (least frequently used) to 4 (most frequently used) was assigned to each nucleotide distribution in reference to its relative base usage percentage. the base usage patterns became pronounced after the shading (coloured as grey) of the higher ranks (rank 3 and 4) versus the lower ranks (rank 1 and 2) (non-coloured) as shown in table 1 . generally, base usage at each codon positions (p1, p2, and p3) would not be equal because the base usage of the coding sequences was not random. moreover, base usage at p1 and p2 was constrained by the coding amino acids. indeed, only 4% of p1 mutations were synonymous and all p2 mutations were non-synonymous [61] ; these resulted in the inflexibility of the base usage at p1 and p2. however, the p3 was expected to have a more variable base usage because 69% of p3 mutations were silent [61] . referring to table 1 , thymine (t) was the least preferred nucleotide at p1. considering all stop codons begin with t (taa, taa, and tga), avoidance of t at p1 was understandable to prevent the unwanted occurrence of stop codon in the viral coding sequence (cds). except for vp5 gene, guanine (g) was comparatively high at p1. this showed the inclination of ibdv to encode aliphatic amino acids (alanine, valine, and glycine). intriguingly, the general base usage patterns at p1 were comparable for all ibdv genes. at p2, all viral genes had the lowest g nucleotide except vp5; which had the lowest t nucleotide. deficiency of g at p2 might attribute to the virus' efforts to prevent the occurrence of stop codon. unlike p1, base usage at p2 was more varied because any p2's mutation will alter the encoded amino acid. in this case, maintaining the physiochemical properties of the virus proteins, most probably by evolutionary forces, would be more important than maintaining a similar base usage. at p3, all viral genes were devoid of t, excluding vp4 and vp5 genes. in addition, c (cytosine) appeared to be the preferred nucleotide. the bias towards c was an interesting feature because most p3's mutations were silent [61] . base usage bias at p3 might confer certain selective advantageous to ibdv; perhaps by having the bias, the virus would be able to match up its codon usage with the host. if so, the virus may improve its translational efficiency and this may lead to increased fitness. meanwhile, it was suggested that favouring of c at p3 would increase the coding ability or new orf formation, considering none of the stop codons contain c nucleotide [62] . however, the dearth of g nucleotide in vp4 gene remained to be investigated. unexpectedly, the overall (total) base usage of all ibdv genes was similar, despite some discrepancies at each codon positions. moreover, although being physically separated, the vp1 still resembled other genes. it was also found that c and a (adenine) were the most preferred nucleotides, whereas t was the least preferred. given that rna virus had high mutation rate [63] and short generation time, why did the virus maintain a similar base usage pattern for all its genes? perhaps this could be the virus strategy to optimise its genes expression. it had been shown that virus could take the advantage of the codon composition to regulate its own programs of gene expression [64] while utilizing the cellular machinery to replicate its genome. base usage of the serotype 2 genome was separately analyzed because only two isolates (oh and 23/82) were available from the genbank 131.0. results indicated that the serotype 2's base usage was comparable to serotype 1's (data not shown). as in serotype 1, serotype 2's vp5 gene had peculiar base usage pattern. further analysis of the vp5's non-overlapping region (nolvps) (11 codons, 34 bp) revealed that although its p3 was also rich in c (>30%), it was richest in t (33.7%); which was differed from other genes. these findings were in agreement with the previous report [62] where overlapping genes showed significant bias in their base usage. to study the relationships among the virus genes, cluster analysis was performed on the virus genes' nucleotide compositions. the virus genes were treated as the 'columns' (seven columns: vp1, vpx, vp2, vp3, vp4, vp5, and nolvp5) and the nucleotide compositions (presented as mean percentages) at each codon positions were treated as the 'attributes' in q-type cluster analysis. since there were three different codon positions and four types of nucleotides, therefore there were 12 attributes: for example, the percentage of adenine at p1, the percentage of guanine at p1… the percentage of cytosine in p3, and so forth. squared euclidean distances were then computed and a tree was constructed using unweighted pair-group average (upgma) amalgamation rule (fig. 1) . cutting the tree at 0.05 linkage distance, it was clear that vp5 gene and its non-overlapping region formed different clusters compared with other viral genes. this led us to suspect that vp5 gene's peculiar base usage was due to its origin; where most likely it was originated by overprinting the 'original' (or existing) viral genes. to generate a novel gene, the virus may either need to synthesize an entirely new nucleotide sequences or alternatively, it may utilize the unused reading frames of the existing genes, a process first proposed by grasse [65] , who called it ''overprinting'' [65] . in tymoviruses, overlapping gene arose by overprinting the ''original'' replicase gene after the virus had diverged from its sister groups from a common ancestor [66] . in the birnaviridae family, vp5 gene was found only in avibirnavirus (ibdv) and aquabirnavirus (infectious pancreatic necrosis virus or ipnv). the other genus, entomobirnavirus (drosophila x virus or dxv) had no equivalent orf to overlap at the 5¢ terminus of vpx [67] . for dxv, the predicted overlapping non-structural protein (believed to be a vp5 homolog) resides in between vp4 and vp3 genes. with regard to the birnaviruses evolution, the most parsimonious explanation appeared to be the polyprotein gene was the birnaviruses' ''original gene'' and vp5 gene arose after the vertebrate birnaviruses (ibdv and ipnv) and the insect birnavirus (dxv) had diverged from their common ancestor. it was unlikely for dxv to initially possess vp5 gene, to lose it subsequently after the divergence, and to create another new orf in order to replace the lost gene's function. due to the frame shift of overprinting gene, the gene will have an unusual codon usage and encodes new protein with physiochemically-biased properties [62] . vp5 protein had been shown to play a role in ibdv pathogenesis [22] and in the release of viral progeny from infected cells [24] . vp5-defective virus had exhibited a slight delay in replication [22] ; but the vp5 gene was inessential for the virus in vitro [23] and in vivo replication [22, 68] . simply put, the acquaintance of vp5 gene as a ''new gene'' by overprinting strategy in birnaviruses evolutionary history, although inessential, may give the virus certain survival advantages to retain the vp5 gene in its genome. gc content (gc%) for many double-stranded dna (dsdna) viruses differed markedly from the gc content of the host cells they infected [69] . to investigate if the same phenomenon applies to ibdv (dsrna), we compared the virus' gc% with the host ( table 2) . results showed the overall gc% of ibdv genome was comparable to the chicken (gallus gallus), in which it was around 52-53%. interestingly, in spite of high mutation rate of the hypervariable region, its gc% nearly matched the host highly expressed genes' gc%. similarly, segment b's gc% was very close to the chicken highly expressed genes' gc%. meanwhile, serotype 2's gc% was differed more to turkey (meleagris gallopavo) than to chicken, although serotype 2 isolates usually isolated from turkey. the reason for this discrepancy remained to be answered. a general pattern of gc% for both virus and host was observed: high gc% in p1, low in p2, and high in p3. these findings would suggest the virus attempt in mimicking the host gc%, particularly p3 gc%, probably in order to optimise its codon usage for translational efficiency and continue to thrive as a successful intracellular parasite. in contrast to the dsdna virus, gc% of the ibdv and the host was comparable. apart from the cpg islands in mammalian genome, cpg dinucleotides were usually under-represented because of two main reasons. first, the classical methylation pathway that converts cpg to tpg [55] . the pathway works by methylating the 5¢ cytosine of cpg and subsequently deaminates the 5-methylcytosine leading to the mutation of cpg and convert to tpg [55] . second, cpg dinucleotides exhibit the greatest thermodynamic stacking energy of all dinucleotides [70, 71] ; therefore, reducing its frequency might facilitate nucleic acids replication and transcription [72] . thus, it will be interesting to investigate if ibdv genome was also devoid of cpg dinucleotides. to study the ibdv's dinucleotide frequencies, three datasets were analysed, namely the polyprotein gene (vpx-vp4-vp3), hypervariable region, and segment b sequences. the vp5 gene was excluded because it was highly conserved (14/28 isolates have identical sequences) and most of its nucleotides were embedded within the vpx gene. the null hypothesis in this study was that there was no selective pressure against cpg dinucleotides or meaning that all dinucleotides pairs had equal chance of occurrence with the reference to the base composition. the mann-whitney u test was used to demonstrate if cpg dinucleotides had significantly deviated from the expected proportion. results from table 1 showed that p3 and p1 were highest in c and g, respectively. thus, if there was no dinucleotide bias, one would expect high cpg dinucleotides at the intercodon position (p3:p1). however, results from the analysis of the three datasets showed that the dinucleotide bias did occur where the expected intercodon cpg dinucleotides were significantly lower than the observed (p < 0.01). this succinctly showed the avoidance of cpg dinucleotides in ibdv genome. this finding was in accordance with karlin et al. [73] where virtually all small eukaryotic viruses were deficient in cpg dinucleotides. meanwhile, tpg intercodon dinucleotide frequency was significantly higher than the expected (p < 0.01). further analysis of the dinucleotide frequency at all possible codon positions gave the same results where the cpg was lower and tpg was higher than expected. moreover, tpa dinucleotides were also found to be lower than the expected (p < 0.01). the dearth of tpa could be due to the susceptibility of ua in mrna to rnase activity [56] (but see [74] ). tpa was also less energically stable than all other dinucleotides [70, 71] , which rendered the nucleic acids to be more flexible in bending and untwisting. this explained why tata sequences at the sites of replication origin were very easy to unwind and interact with other molecules [75] . hence, the restriction of tpa dinucleotides may help in avoiding inappropriate binding of cellular factors to the viral nucleic acids. furthermore, given the fact that two out of three stop codons have tpa dinucleotides, reducing the genomic tpa dinucleotides would certainly help in avoiding the occurrence of unwanted mutation-derived stop codons. the relationship between cpg and tpg dinucleotides were studied further by using correlation. for each dinucleotide pairs, the value of dinucleotide index (dni) was calculated as the ratio of observed dinucleotides versus the expected dinucleotides. results indicated that the number of cpg dinucleotides was negatively correlated with tpg dinucleotides. the r-values for segment b and polyprotein dataset were )0.803 and )0.815 (p < 0.0001), respectively. correlation for hvr dataset (r ¼ )0.406, p < 0.0001) was however weaker; probably due to its shorter sequence. we were fully aware that correlation did not imply causation, but based on the fact and our empirical results, we concluded that the deficiency of cpg probably contributed to the abundance of tpg in the ibdv genome through the conversion of methylated cpg to tpg [55] . the vertebrate immune system had apparently evolved the ability to recognize the unmethylated-cpg motifs and responds with a rapid and coordinated cytokine response leading to the induction of humoral and cell-mediated immunity [76, 77] . moreover, cpg-based adjuvant had shown to trigger protective antiviral cytotoxic t cell responses [78] . therefore, we proposed that by avoiding the cpg dinucleotides, ibdv might be able to minimize its antigenicity and avoid undesirable host immune response. on a different perspective, we suggested the use of cpg-based adjuvant in ibd killed vaccine; considering the virus attempts in avoiding cpg dinucleotides. it had been shown that cpg oligonucleotides could be a valuable adjuvant for poultry vaccines [79] . thus, the potential usage of cpg-based adjuvant in ibd killed vaccine may be the future research interest. classifying ibdv strains was indispensable for the control and prevention of ibd. apart from path-ological and serological classification, ibdv had been grouped by its sequence characteristics [40, 42] ; where each ibdv strains had its own characteristic restriction enzymes sites [41] and molecular markers [40] . ibdv dinucleotide usage (or dinucleotide patterns) was however unknown, despite many sequence analysis papers on ibdv genome had been published. in coronaviruses, analysis of dinucleotide frequency had separated the virus into two groups that roughly reflect its taxonomic origins [80] . thus, the current study was to investigate if dinucleotide patterns differed among the ibdv strains and the practicality of ''principal component analysis of the dinucleotide frequencies'' (df-pca) approach in studying the ibdv dinucleotide patterns. dni was calculated for each of the 16-types of dinucleotide pairs. since dni was a relative measure of dinucleotide frequency, pca rather than the correspondence analysis was used in the analysis [81] . the concepts and principles of pca have been extensively described in most multivariate analysis textbooks, so it will not be discussed here. all the datasets (hypervariable region, polyprotein and segment b) were analysed by the df-pca approach. for hypervariable region and polyprotein datasets, three outliers namely the australian cvibdv (00/273) and serotypes 2 (oh, 23/82) isolates were excluded because of their unique sequence characteristics. results of df-pca were depicted as a graph plot in which the axes represent the amount of ''extracted variation'' (fig. 2) . in fig. 2a , the first two axes accounted for 52% (35.84% + 15.34%) of the total variance, or in other words it explained 52% of the total variation observed from the dinucleotide patterns of the hypervariable region. noticeably, there were two distinct groups separated along the first axis: a very virulent group on the left and attenuated group on the right. other strains were remained in between the two major groups. there was no clear separation between classical and attenuated strain. this probably because many attenuated isolates originated from the attenuation of classical isolates. the bold capital v and a were okym (vvibdv) and okymt (attenuated form of okym) isolates respectively [82] . interestingly, it appeared to be a subtle ''right-shift'' of okym towards the attenuated strains after the attenuation process, but not to the extent of total separation from the vvibdv cluster. while the impact of the attenuation on the ibdv's dinucleotide patterns remained to be investigated, the inability of okymt to be within the atlbdv cluster reflected that df-pca was in fact influenced by the virus evolutionary relationship. however, there was no evidence that ibdv isolates situated on the extreme left will be the ''most virulent'' vvibdv and the extreme right isolate will be the ''most attenuated'' ati-bdv. incorrectly classified isolates could be quickly detected on the df-pca graph due to their odd positions. it was found that the classifications of zj2000 (genbank accession no. af321056) and gz902 (af006699) isolates were inappropriate. zj2000 was reported as a highly virulent ibdv [83] but its position in the graph ( fig. 2a and 2b ) seemed to be related more to the attenuated or classical strain than to the vvibdv strain. to examine this problem closely, sequence analysis for zj2000 was done. it was found that none of the important vvibdv markers (242ile, 256ile, and 294ile) [40] and serine-rich heptapeptide virulent marker ''swsasgs'' [84] were present in zj2000. in addition, zj2000 had 253his and 284thr that were closely related with the attenuated strain than to the virulent strain [40] . for the gz902 (''variant strain''), its hypervariable region sequence was found to be identical with another attenuated strain gz29112 (af051837) and located exactly at the same position in the map (circle in fig. 2a was the location for both gz902 and gz29112). sequence analysis on both isolates found that gz29112 was grouped correctly whereas gz902 should be grouped as the attenuated strain by referring to the molecular markers. fig. 2b and c showed the df-pca results for polyprotein and segment b datasets. the first two axes of polyprotein and segment b datasets explained about 57% and 66% of total variation, respectively. we found that df-pca on hypervariable region sequences could yield comparable result as the longer polyprotein gene sequences. this probably because the virulence molecular determinants, cell tropism, and pathogenic phenotype of ibdv all fall within the hypervariable region [85] . meanwhile, atypical isolates (upm94/ 273 and k310) were located closely with the vvibdv isolates as shown in fig. 2b . this was understandable because atypical strain was considered as a subset of vvibdv strain [86] . vp1 gene had an intricate dinucleotide pattern (fig. 2c) where different ibdv serotypes and strains were intermingled with each other on the graph. intriguingly, rather than forming an isolated cluster, serotype 2 isolates (oh and 23/82) located near the cvibdv and atibdv isolates. in addition, the vvibdv isolates (sh95 and habin-1), cvibdv isolate (00/273), and atibdv (il4) had unique dinucleotide patterns whereby they did not belong to any significant cluster. these findings disagreed with islam et al. [87] where vvibdv's vp1 gene distinctly separated from other strains. perhaps this was because the number of sequences used in this study (n ¼ 25) was larger compared with islam's (n ¼ 18). new vvibdv isolates such as sh95 (ay134875) and habin-1 (af455136) were not included in the previous study. furthermore, df-pca approach was differed from the phylogenetic approach because df-pca analysed the inter-relationships of the 16dinucleotide pairs, whereas the phylogeny method (specifically distance method) calculated evolutionary distances based on a chosen substitution (or evolutionary) model. the substitution model chosen by islam and co-workers in the construction of their vp1-phylogenetic tree was however not stated in their report. in a different viewpoint, it should be remembered that ibdv is a bisegmented virus and whether the bewildering dinucleotide patterns of vp1 gene were due to the inter-strains gene reassortment remained to be investigated. the use of df-pca was unintended to be a substitute for the current strains classification methods, even though it was granted with some abilities in grouping the ibdv strains. in this study we used df-pca to demonstrate the unique characteristics of each ibdv strains by its dinucleotide frequency. df-pca analyzed the delicate inter-relationships among the dinucleotide pairs and visually projected the results in a form of graph or ''map''. the result from the df-pca analysis was not solely dependent on the sequence's identity percentage, albeit this was an important factor. for example, although okym shared a 92.9% of sequence identity with both f9502 and zj2000 isolates, zj2000 was located far away from okym in comparison with f9502 (see fig. 2a ). although many underlying biological properties of df-pca remained to be investigated, we believed that the results of df-pca reflected the evolutionary history of the virus considering each dinucleotide pairs were influenced by the evolutionary forces (and thus constituted the genomic signature). in phylogenetic analysis, particularly clustering algorithm, evolutionary relationships were studied by grouping the taxa into various groups or clades. and with regard to ibdv, these clades usually reflect the strain of the virus; for example, very virulent isolates are grouped together but not with the variant isolates.therefore, a taxon must either be in or out from a given clade. in contrary, by using df-pca, the inter-rela-tionships among the ibdv isolates were visually displayed as ''points'' on the graph rather than forming the distinct clusters. thus, df-pca allowed the shades of grey and may promote further insight into the virus evolutionary history. the virus genome is packed with information and it means everything for the virus survival. in this study we had uncovered many genomic properties of ibdv by analysing its base usage and dinucleotide frequency. we envisaged that similar approach could be adopted to study other viruses' genes to the understanding of the fundamental properties of the viruses. virus taxonomy: seventh report of the international committee on the taxonomy of viruses base usage and dinucleotide frequency 2nd international congress/13th vam congress and cva-australasia/oceania regional symposium vet assoc malaysia fundamental of molecular evolution evolution of living organisms this work was generously supported by irpa grant 54091 from the malaysian government. key: cord-279863-5kxgu4t9 authors: oem, jae-ku; an, dong-jun title: phylogenetic analysis of bovine astrovirus in korean cattle date: 2013-11-23 journal: virus genes doi: 10.1007/s11262-013-1013-0 sha: doc_id: 279863 cord_uid: 5kxgu4t9 bovine astrovirus (bastv) belongs to a genetically divergent lineage within the genus mamastrovirus. the present study showed that bastv was associated with the gastroenteric tracts of cattle in nine positive fecal samples from 115 cattle, whereas no positive samples were found in the brain tissues of 14 downer cattle. interestingly, the positive diarrheal samples were obtained mainly from calves aged 14 days–3 months. bayesian inference tree analysis of the partial orf1ab and capsid (orf2) gene sequences of bastvs identified four divergent groups. eleven bastvs, four porcine astroviruses, and two deer astroviruses (dastvs; ccastv-1 and -2) belonged to group 1; group 2 contained two bastvs (bastk08–51 and bastk10–96) with another two in group 3 (bastk08–2 and bastk08–53); and group 4 comprised the bastv-neuros1 strain derived from a cattle brain tissue sample and an ovine astrovirus. the same divergent groups were obtained when the pairwise alignments were produced using both amino acid and nucleotide sequences. the korean bastvs isolated from infected cattle had a nationwide distribution and they belonged to groups 1, 2, and 3. electronic supplementary material: the online version of this article (doi:10.1007/s11262-013-1013-0) contains supplementary material, which is available to authorized users. astroviruses are single-stranded positive-sense rna viruses that measure approximately 6.4-7.3 kb in length. the family astroviridae comprises two genera: mamastrovirus infects mammals and avastrovirus infects birds [1] . human astrovirus was first reported in children with diarrhea in 1975 [2] and mamastroviruses were found subsequently in a variety of wild hosts, including sheep, cow, pig, dog, cat, red deer, mouse, mink, bat, cheetah, brown rat, roe deer, sea lion, dolphin, and rabbit [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . bovine astrovirus (bastv) was one of the first astroviruses [17] to be discovered and it has been isolated in the usa and the uk. two bastv serotypes were established based on the results of a virus neutralization assay [18] . the genomic characterization and sequence analysis of astroviruses in bovine fecal specimens collected in hong kong provides evidence of potential recombination in orf2 [19] . recently, the complete genome of a novel bastv associated with neurological disease in cattle was sequenced, i.e., boastv-neuros1, which was phylogenetically related to an ovine astrovirus (oastv). a previous study suggested that boastv-neruos1 infection was a potential cause of neurological disease in cattle [20] . however, genetically diverse lineages of bastvs have not been identified in many countries because there have been few studies of astroviruses derived from cattle. thus, the present study investigated the genetic groupings of korean bastvs and examined their relationships with the age of cattle infected with bastvs. in total, 115 fecal samples were collected from cattle with certain or suspected diarrheal disease at cattle farms throughout korea between january 2008 and december 2010. the cattle comprised 64 calves aged \30 days and 51 cattle aged [30 days. based on the fecal condition, 91 samples were from animals with diarrhea and 24 from nondiarrheic animals. the cattle comprised 84 korean cattle and 31 holstein cattle. nonambulatory cattle which are commonly referred to as ''downer'' cattle are unable to stand or walk. cattle brain tissue samples with a histopathological diagnosis of encephalitis were also collected from 14 downer cattle between 2010 and 2012. out of 14 brain samples collected from downer cattle, 7 were found to be positive for akabane virus and two were bovine viral diarrhea virus (bvdv) whereas no pathogenic agent for encephalitis was detected from the other. viral rna was extracted from the feces using trizol ls b , according to the manufacturer's instructions. bastv was detected in fecal specimens by rt-pcr using a specific primer set for the orf1ab and orf2 regions of bastv (bastv-f, 5 0 -gtgtttggcatgtgggtyaarcc-3 0 and bastv-r: 5 0 -rtcvyybktggtggt-3 0 ), which were designed based on known strains deposited in genbank (accession no. hq916313-hq916317). the rt-pcr process amplified a 965-nt long fragment at 42°c for 30 min, 94°c for 5 min, 94°c for 40 s, 51°c for 40 s, and 72°c for 1 min, followed by 40 cycles using virus-specific conditions. the bastv associated with neurological disease in cattle was also detected in brain tissue specimens by rt-pcr, as described previously [20] . products with the expected size were cloned using the pgem-t vector system ii tm (promega, cat. no. a3610, usa). the cloned gene was sequenced with t7 and sp6 sequencing primers using an abi prism ò 3730xi dna sequencer at the macrogen institute (macrogen co. ltd). the sequences of all the bastv-positive samples were submitted to genbank under accession numbers kf668444-kf668452. nine of 115 fecal samples from korean cattle were positive for bastv and all of the bastvs were related to diarrhea. however, bastv was not detected in 14 cattle brain tissue samples. although bastv was first reported in england in 1987 [17] , the association between bovine astroviruses and gastroenteric diseases in cattle is still not clear. a recent study reported that bastv is not associated directly with severe diarrheic disease in calves under natural conditions [17, 19, 21, 22] . in the present study, nine korean bastvs were associated with clinical diarrhea in cattle where calves aged \1 month accounted for 77.8 % of cases (table 1) . a previous study shows that bastvs were excreted by 60-100 % of calves on farms [21] while a recent study of rectal swab samples from asymptomatic adult cattle showed that only 2.4 % (5/209) contained bastv [19] . this must be because bastvs are more frequent in young calves than adult cattle. to investigate the relationships between astroviruses and other bovine viruses that cause diarrhea in cattle, a screening test was conducted using specific primers for the detection of bovine rotavirus (brv) [23] , bovine coronavirus (bcv) [24] , bvdv [25] , and bovine kobuvirus (bkv) [26] , as described previously. co-infections with other viruses were associated with the clinical symptoms of diarrhea in only two cases: the bastk08/1 strain derived from a 20-day-old calf was coinfected with brv and the bastk10/31 strain from a 14-day-old calf was co-infected with both brv and bvdv (table 1) . although the association between bkv and diarrhea or gastroenteritis is unclear, it was co-infected with six korean bastvs, except for bastk08/51, bastk08/53, and bastk10/35 (table 1) . in cattle, two astrovirus serotypes have been recognized based on serological investigations, i.e., boastv-1 and boastv-2 infections [18] , and recent phylogenetic analyses support the classification of bastvs and the newly discovered astroviruses in roe deer (ccastv) under the proposed mamastrovirus genocluster gi [14, 19] . all of the astrovirus sequences were aligned using the clustal x alignment program [27] . the nucleotide sequences were translated and the shared nucleotide and amino acid sequence identities among the astrovirus strains were calculated using bioedit 7.053 [28] . the analysis of the diversity of bastvs in the present study identified four table) . bayesian trees were generated with mrbayes 3.1.2 [29, 30] using best-fit models, which were selected with mrmodeltest 3.7 [31] for nucleotide sequences and prottest 1.4 [32] for amino acid sequences. markov chain monte carlo analyses were run using 2,000,000 generations for each nucleotide and amino acid sequence. the best-fit model of the orf1ab nucleotide sequence selected by mrmodeltest 3.7 software was trnef?g, according to the results of a hierarchical likelihood ratio test. the likelihood parameter was set to nst = 6 and rate = gamma for the datasets, and the gamma distribution shape parameter was 0.5612. the substitution model rmat was 1. and -2 [19] . recently, the boastv-neuros1 strain was detected in the brain tissues of cattle and the analysis of its genetic diversity showed that it was most closely related to the oastv prototype, which was identified in 1977 [3] , whereas it was phylogenetically distant from a recently reported oastv [33] and the hong kong bastvs [20] . this suggests the occurrence of multiple cross-species transmission events among hosts and other animal species. however, it appears that the histopathogenic findings of encephalitis in korean downer cows were not associated with the detection of boastv-neuros1 in brain tissue. the bi analysis of the partial orf1ab and/or orf2 genes also showed that all of the known bastvs could be separated into four groups (fig 1a, b) , in the same way as the diversity analysis. group 1 of the bi tree contained six hong kong bastvs and five korean bastvs, groups 2 and 3 included only korean bastvs, and the boastv-neuros1 strain was the only member of group 4. in conclusion, the present study identified four bastvs groups based on the phylogenetic analysis and their shared pairwise amino acid sequence identities. the bastv detection rate in cattle feces was higher in calves aged \1 month compared with adult cattle. thus, continuous surveillance of novel diversity in bastvs should be conducted on many cattle farms throughout the world because of the risk of emerging astroviruses associated with neurological disease in cattle. pastk-76 pig (jq696855) pastk-120 pig (jq696856) b76/hk cow (hq916316) b170/hk cow (hq916314) ccastv-2 deer (hm447046) ccastv-1 deer (hm447045) poastv12-4 pig (hm756259) poastv14-4 pig (hm756260) pastk-76 pig (jq696855) pastk-120 pig (jq696856) b170/hk cow (hq916314) b161-1 cow (jf796126) cow pastv-2/2007/hun pig (gu562296) wbastv-1 wildboar (jq340310) pastv/1117/mn pig (jf272561) pastk-114 pig (jq696847) pastk-4 pig (jq696831) pastk-5 pig (jq696832) pastv/1116/pa pig (jf272560) human astroviurs 4 human (dq344027) virus taxonomy. eighth report of the international committee on taxonomy of viruses acknowledgments we are grateful to dr. soo-kyung joo for technical assistance. key: cord-272693-432ixb7g authors: phillips, j. e.; jackwood, m. w.; mckinley, e. t.; thor, s. w.; hilt, d. a.; acevedol, n. d.; williams, s. m.; kissinger, j. c.; paterson, a. h.; robertson, j. s.; lemke, c. title: changes in nonstructural protein 3 are associated with attenuation in avian coronavirus infectious bronchitis virus date: 2011-09-10 journal: virus genes doi: 10.1007/s11262-011-0668-7 sha: doc_id: 272693 cord_uid: 432ixb7g full-length genome sequencing of pathogenic and attenuated (for chickens) avian coronavirus infectious bronchitis virus (ibv) strains of the same serotype was conducted to identify genetic differences between the pathotypes. analysis of the consensus full-length genome for three different ibv serotypes (ark, ga98, and mass41) showed that passage in embryonated eggs, to attenuate the viruses for chickens, resulted in 34.75–43.66% of all the amino acid changes occurring in nsp 3 within a virus type, whereas changes in the spike glycoprotein, thought to be the most variable protein in ibv, ranged from 5.8 to 13.4% of all changes. the attenuated viruses did not cause any clinical signs of disease and had lower replication rates than the pathogenic viruses of the same serotype in chickens. however, both attenuated and pathogenic viruses of the same serotype replicated similarly in embryonated eggs, suggesting that mutations in nsp 3, which is involved in replication of the virus, might play an important role in the reduced replication observed in chickens leading to the attenuated phenotype. electronic supplementary material: the online version of this article (doi:10.1007/s11262-011-0668-7) contains supplementary material, which is available to authorized users. avian coronavirus infectious bronchitis virus (ibv) causes a highly contagious upper respiratory tract disease in chickens. live attenuated vaccines are used against the virus but the disease is difficult to control because cross-protection does not usually occur between different serotypes. the respiratory disease caused by this virus can be mild to moderate and can vary depending on the breed of chicken infected as well as the strain of the virus [1] . the virus is worldwide in distribution, and in addition to chickens, ibv has been isolated from peafowl (galliformes) and other electronic supplementary material the online version of this article (doi:10.1007/s11262-011-0668-7) contains supplementary material, which is available to authorized users. gamma-coronaviruses have been isolated from teal (anas crecca), geese (anserinae), pigeons (columbiformes), and ducks (anserfiformes) [2] . coronaviruses are enveloped viruses in the order nidovirales and are classified based on genome organization and antigenic characteristics as alpha (previously group 1), beta (previously group 2), and gamma (previously group 3)-coronaviruses with the avian coronaviruses belonging to the gamma-coronaviruses. subgroups within each group have been reported, and recently, comparative full-length genome analysis placed a novel coronavirus from a beluga whale in subgroup 3b and three new coronavirus isolates from passerine birds in subgroup 3c [3] . infectious bronchitis virus and related isolates as well as turkey coronavirus (tcov) are assigned to subgroup 3a. coronaviruses have a single-stranded positive-sense rna genome ranging in size from 27 to 30 kb, with a 5 0 cap and a 3 0 poly-a tail. transcription occurs through a leader-primed rna synthesis mechanism that results for ibv in six 3 0 co-terminal subgenomic mrna molecules. four structural proteins-spike (s), envelope (e), membrane (m), and nucleocapsid (n)-along with the viral rna make up the enveloped virion. the n protein binds to the viral rna forming the ribonucleoprotein (rnp) complex. the e and the m protein are membrane bound proteins that play a role in virus assembly [4] . the s glycoprotein on the surface of the virus mediates attachment to the host cell, is responsible for fusion of the host cell membrane and viral envelope, and in ibv, it contains epitopes that define serotype and induce neutralizing antibodies [5] . the s glycoprotein of ibv is post-translationally cleaved into s1 and s2 subunits, and the s1 subunit is reported to have three hypervariable regions [6] [7] [8] . mutations, insertions, deletions, and recombination in s contribute to the genetic diversity of ibv, which is recognized as different genetic or serologic types of the virus [5] . two polyproteins 1a and 1ab account for approximately two-thirds of the viral genome-coding region and make up the replication transcription complex (rtc). the polyprotein 1ab is translated through a-1 frame-shift translation mechanism that occurs approximately 20-40% of the time [9] . the ibv 1a and 1ab polyproteins are post-translationally cleaved into 15 nonstructural proteins (nsps), nsps 2 through 16 by a papain-like protease (plp) and the main protease (mpro), also referred to as the 3c-like protease [10] . ibv does not have an nsp 1 equivalent found in some other coronaviruses. the plp contained within nsp 3 is divided into pl1 and pl2 papain-like proteases. the pl1 protease, present in other coronaviruses, is truncated and nonfunctional in ibv, thus pl2 cleaves nsps 2, 3, and 4 [11] . the mpro contained within nsp 5 cleaves nsps 5 through 16 . the biological characteristics of many nsps have been previously reported [9, 10, [12] [13] [14] [15] [16] [17] . in addition to nsps 3 and 5, which contain proteases pl2 and mpro, respectively, nsps 2, 4, and 6 contain hydrophobic residues predicted to play a role in anchoring the rtc to the golgi. nonstructural proteins 7, 8, 9, and 10 are reported to have rna-binding activity. nonstructural protein 11/12 is the rna-dependent rna-polymerase, nsp 13 is a rna helicase, nsp 14 is an exoribonuclease, nsp 15 is an endoribonuclease, and nsp 16 is a methyltransferase. adaptation of ibv to different hosts has been associated with changes in the s glycoprotein, suggesting that spike plays a key role in pathogenicity [18, 19] . however, the ectodomain of the s glycoprotein from the beaudette strain of ibv, an attenuated laboratory strain, was replaced with an s from a pathogenic strain (mass 41 strain) of the same serotype. this chimeric virus was shown to induce an immune response but remained nonpathogenic in chickens, indicating that the s glycoprotein is not solely responsible for pathogenicity of ibv [2, 20] . in another study, a chimeric ibv was created with the replicase genes 1a and 1ab from the attenuated beaudette strain, and all of the structural genes from the pathogenic mass 41 strain including the s gene. this chimeric virus was not pathogenic in chickens, indicating that the replicase proteins also appear to be determinants of ibv pathotype [2, 21] . genetic differences reported in 1a and s between virulent and avirulent strains of ibv also led others to suggest that the replicase proteins, in addition to s, are involved in the pathotype of the virus [22] . to examine the sequence changes in individual genes associated with attenuation of ibv for chickens, we sequenced and compared the full-length consensus genomes of pathogenic ibv viruses and egg-passaged attenuated (for chickens) viruses from three different serotypes. we also examined the replication of pathogenic and attenuated viruses in embryonated eggs and in chickens to determine whether there are differences in growth rate between the pathotypes. pathogenic and attenuated (for chickens) ibv strains from three different serotypes were used in this study. the pathogenic arkansas-delmarva poultry industry ark/ark-dpi/81 and the massachusetts strain mass/mass41/41 were obtained from dr. j. gelb, jr. (university of delaware, newark, de). the pathogenic georgia 98 virus, ga98/cwl0470/98 virus, was isolated in our laboratory in 1998 [23] . the pathogenic viruses were propagated in 10-day-old embryonated chicken eggs (ark/ark-dpi/81 pass 6, mass/mass41/41 pass 8, and ga98/cwl0470/98 pass 8) as previously described [24] . the attenuated viruses of the same strain and serotype were obtained from intervet and were designated ark-attenuated (mildvac-ark), mass41-attenuated (mildvac-h), and ga98attenuated (mildvac-ga-98). whole-genome nucleotide and deduced amino acid sequence analysis viral rna extraction, rt-pcr, library construction, and sequencing were conducted as previously described [25] . briefly, the viruses were filtered through a 0.8-lm filter then through a 0.22-lm filter (millipore, billerica, ma) prior to rna extraction. viral rna was purified using the high pure rna isolation kit according to the manufacturer's recommendation (roche diagnostic corporation, foster city, ca) and re-suspended in depc-treated water. reverse transcription (rt) and polymerase chain reaction (pcr) amplification were performed with the takara rna la pcr kit (takara bio inc., otsu, shiga, japan) using a random primer and an amplification primer in a strand displacement amplification reaction following the manufacture's protocol. the sequence of the random reverse transcription primer was 5 0 -agc ggg ggt tgt cga atg ttt gan nnn n-3 0 , and the amplification primer sequence, which is designed to anneal to the complement of the conserved region on the random primer, was 5 0 -agc ggg ggt tgt cga atg ttt ga-3 0 . both primers were obtained from integrated dna technologies, inc. (coralville, ia). for the rt reaction, a master mix was prepared, which included mgcl 2 (5 mm), 109 rna pcr buffer (19) , dntp mixture (1 mm), rnase inhibitor (1 units/ll), reverse transcriptase (0.25 units/ll), 5 0 degenerate primer (2.5 lm), and rna (5.75 ll/reaction) then 10 ll per sample was aliquoted in a thermocycler tube. the reaction conditions for the rt reaction were 10 min at 30°c for the primer annealing then an hour at 50°c for extension followed by a five-minute incubation at 99°c for inactivation of the enzyme and a five-minute period at 5°c. a pcr master mix-which included at the final concentrations mgcl 2 (2.5 mm), 109 la pcr buffer (19) , sterilized distilled water (32.25 ll), takara la taq (1.25u/50 ll), and 5 0 primer (0.2 lm)-was prepared and 10 ll of the rt reaction was added to 40 ll of the mix. the amplification reaction consisted of a 94°c step for 2 min followed by 30 cycles of 94°c for 30 s, 60°c for 30 s, and 72°c for 3 min. ten pcr were combined for each virus and purified using the qiaquick pcr purification kit (qiagen, foster city, ca) and then run on a 1% agarose gel to visualize the amplified product. the pcr products were size selected by cutting out amplicons between 500 and 1500 bp from the gel. the amplicons were purified using the qiaquick (qiagen) gel purification kit. the topo cloning kit (invitrogen, life technologies, carlsbad ca) was used to clone the pcr products into the pcr-xl-topo vector according to the manufacturer's recommendations. then, one shot topo electrocompetent escherichia coli cells (invitrogen) were transformed using 30 ll of competent cells mixed with 2 ll of the ligation reaction and electroporated with settings at 20 kv and 200 x using a biorad (biorad gene pulser, hercules, ca). the electroporated cells were incubated at 37°c in 480 ll of super optimal broth medium for 1 h on a rotary shaker. the cultures were mixed with 70% glycerol and frozen in -80°c until plated on q-trays (genetix, boston, ma) containing liquid broth agar cat#3002-032 (mp biomedicals, llc, solon, oh) with 50 lg/ml of kanamycin. the q-trays were pre-warmed at 37°c before the entire culture (approximately 500 ll) was spread on the plates and incubated overnight at 37°c, then robotically picked with a q-bot (genetix, boston, ma). plasmid dna from the libraries of cloned cdna fragments for each virus was isolated using an alkaline lysis method modified for the 96-well format, and incorporating both hydra and tomtek robots (http://www.intl-pag.org/ 11/abstracts/p2c_p116_xi.html). cycle sequencing reactions were performed using the bigdye tm terminator ò cycle sequencing kit version 3.1 (applied biosystems, foster city, ca) and mj research (watertown, ma) thermocyclers. finished reactions were filtered through sephadex filter plates into perkin-elmer microamp optical 96-well plates. a 1/12-strength sequencing reaction on an abi 3730 was used to sequence each clone from both the 5 0 and 3 0 ends. each viral genome was sequenced to approximately 109 coverage. the accuracy of the sequence was ensured by generating data in both the 5 0 and the 3 0 directions. gaps and areas with less than 29 coverage were identified and specific primers were synthesized (idt) for rt-pcr amplification and sequencing of the ambiguous areas. the rt-pcr was conducted as described above, and the reaction conditions were 42°c for 60 min, 95°c for 5 min, then 10 cycles of 94°c for 30 s, 50°c for 30 s, 68°c for 90 s, followed by 25 cycles of 94°c for 30 s, 50°c for 30 s, 68°c for 90 s ? 5 s/cycle added. the final elongation step was 68°c for 7 min, and then, the reaction was cooled to 4°c. the pcr products were sequenced in both directions using the abi prism bigdye terminator v3.0 (applied biosystems, foster city, ca) and the specific primers that were used for amplification at a concentration of 15 ng. the amount of cdna added to the reaction ranged from 20 to 30 ng, and the sequencing reactions were analyzed on an abi 3730 (applied biosystems). chromatogram files and trace data were read and assembled using seqman pro, and genome annotation was conducted with seqbuilder (dnastar, inc., v.8.0.2, madison, wi). low-quality segments and vector sequence were trimmed from the ends of each sequence and removed from further analysis. full-length genomes were uploaded to the national center for biotechnology information (ncbi) open reading frame (orf) finder (http://www.ncbi. nlm.nih.gov/gorf/) to identify orfs. nucleotide and deduced amino acid alignments were generated using clustalw, and phylogenetic trees with 1,000 bootstrap replicates were constructed in the megalign program (dnastar, inc.). hydrophilicity analysis using hopp-woods and kyte-doolittle were conducted with the protean program (dnastar, inc.). the viruses were titrated in 10 day of incubation embryonated eggs to obtain a 50% embryo infectious dose (eid 50 ) according to previously published procedures (24). two-week-old chickens were given 1 9 10 4 eid 50 of virus in 100 ll of pbs equally divided intraocularly and intranasally. due to isolator availability, different numbers of birds were tested for each virus. six birds were given ark/ ark-dpi/81, 20 birds were given ark attenuated, 10 birds each were given mass/mass41/41, mass attenuated, and ga98 attenuated, and 12 birds were given ga98/ cwl0470/98. each of the negative control groups consisted of 10 birds. clinical signs and lesions were recorded, and tracheal swabs were collected and placed in 1 ml of ice-cold pbs (ph 7.4) at 5 days post-exposure [26] . the presence of virus in the tracheal swab supernatant was determined by quantitative real-time rt-pcr [27] . tracheas were collected in 10% neutral buffered formalin, routinely processed into paraffin, and 5-lm sections were cut for hematoxylin and eosin staining. epithelial hyperplasia, lymphocyte infiltration, and the severity of epithelial deciliation were scored for each trachea with 1 being normal and 4 being severe [28] . as a measure of adaptation, we examined the growth of the ark/ark-dpi/81, ark attenuated, mass/mass41/41 and mass41-attenuated in embryonated eggs and chicks. because of limited isolator availability, we did not include the ga 98 viruses in this experiment. virus growth in embryonated eggs was examined by inoculating 1 9 10 5 eid 50 of each virus into 30 eggs at 10 days of incubation via the chorioallantoic route. for each virus, allantoic fluid was harvested from five eggs at 12, 24, 36, 48, 72, and 96 h after inoculation. the amount of virus present in fresh (not previously frozen) allantoic fluid was determined by quantitative real-time rt-pcr [27] . to examine virus growth in chicks, 1 9 10 5 eid 50 of each virus was inoculated into 30 specific pathogen-free chicks at 1 day of age via the ocular/nasal route. tracheal swabs were collected from each of five birds at 12, 24, 36, 48, 72, and 96 h after inoculation and placed in 1 ml of ice-cold pbs (ph 7.4). once the birds were swabbed, they were removed from the study. the amount of virus present in the fresh (not previously frozen) tracheal swab supernatant was determined by quantitative real-time rt-pcr [27] . sequences generated in this study were submitted to genbank and assigned the following accession numbers: ark/ark-dpi/81 (gq504720); ark-attenuated (gq504721); ga98/cwl0470/98 (gq504722); ga98-attenuated (gq50 4723); mass/mass41/41 (gq504724); and mass41-attenuated (gq504725). the consensus sequence of the full-length genomes of ark/ ark-dpi/81, ark-attenuated, ga98/cwl0470/98, ga98attenuated, mass/mass41/41, and mass41-attenuated were sequenced, and the genome sizes were found to be 27,651 nt, 27,620 nt, 27,638 nt, 27,621 nt, 27,475 nt, and 27,451 nt, respectively. the genome organization consisting of a 5 0 untranslated region (utr), polyproteins 1a and 1ab, spike, 3a, 3b, envelope, membrane, 4b, 5a, 5b, nucleocapsid, and 3 0 utr was the same for all six viruses (table 1) . gene locations for the nsps in orf 1a and 1ab are shown in table 2 . the 4b protein, previously recognized in m41 [21] , is 94 amino acids long and located downstream from the membrane protein in all the viruses sequenced. a blast search was conducted, and we found the protein to have 96% sequence identity with the 4b protein from tcov (tcov, genbank accession number eu022526.1). in addition, a 6b protein downstream of the nucleocapsid protein was similar to the predicted 6b orf reported for tcov (genbank accession number eu022526.1). the 6b orf was identified in the ark and ga98 viruses but not in the mass 41 viruses. alignment and phylogenetic analysis of the full-length genomes show that ark/ark-dpi/81 has 99.1% sequence identity with ark-attenuated, ga98/cwl0470/98 has 97.1% sequence identity with ga98-attenuated, and mass/ mass41/41 has 92.3% sequence similarity with mass41attenuated (fig. 1) . nucleotide and amino acid sequence differences were identified between each of the pathogenic and attenuated viruses (table 3) . when the genome sequences are compared, there are 249 nucleotide (nt) changes resulting in 62 amino acid changes in the coding regions between the ark viruses, 629 nt changes resulting in 268 amino acid changes between the ga98 viruses, and 1,805 nt changes resulting in 462 amino acid changes between the mass 41 viruses (see table 3 and supplemental data tables 5 and 6 ). the size of the 5 0 utr is 528 nt for all the viruses ( table 1 ). the number of nt differences between the ark viruses for the 5 0 utr was 25 with a 95.6% identity. the ga98 viruses have 6 nt differences with 98.9% identity, and the mass viruses have 12 nt differences with 98.3% identity in the 5 0 utr ( table 3 ). the leader junction sequence, nucleotides 57-64 (5 0 -cttaacaa), were found to be identical for the ark and mass viruses, whereas the ga98/cwl0470/98 pathogenic virus leader junction sequence is 5 0 -ctcaacaa and the ga98 attenuated virus sequence is 5 0 -ctttacaa. the transcriptional regulatory sequences (trs) were identical in all of the viruses and were 5 0 -ctgaacaa-3 0 for mrnas 2 and 3, and 5 0 -cttaacaa-3 0 for mrnas 4, 5, and 6. the size of the 3 0 utrs is 273 nt for ark/ark-dpi/81 pathogenic and ark-attenuated, 276 nt for ga98/ cwl0470/98, 244 nt for ga98-attenuated, and 322 nt for mass/mass41/41, and mass41-attenuated ( table 1 ). the number of nt differences within the 3 0 utrs for the ark viruses is 6 with 98.5% identity. the ga98 viruses have 9 nt differences resulting in 97.1% identity, and the mass viruses have 2 nt differences with 99.4% identity within the 3 0 utrs ( table 3) . the 3 0 utrs contain the s2m motif, which is 41 nt long with a sequence identity of 92.7% or higher between the six viruses. analysis of the locations and number of sequence differences between pathogenic and attenuated viruses of the same serotype for individual nsps in polyproteins 1a and 1a/b (table 3) shows that nsp 3 has the highest number of amino acid differences among all the nsps. in addition, nsp 3 has the greatest number of differences when coding regions across the entire genome are compared. a schematic representation of nsp 3 and number of amino acid changes in each domain is presented in fig. 2 . the nsp 3 orf has 43.66% of all amino acid differences observed between ark/ark-dpi/81 and ark-attenuated (including a ten amino acid deletion in the attenuated virus at positions 789-798), 34.75% of all amino acid differences observed between ga98/cwl0470/98 and ga98-attenuated (including an eight amino acid deletion in the pathogenic virus at positions 901-908 and a three amino acid deletion in the pathogenic virus at positions 950-952), and 37.08% of all amino acid differences observed between mass/mass41/41 and mass-attenuated (including a ten amino acid deletion in the attenuated virus at positions 797-806). these changes represent 1.96, 5.18, and 11.06 differences per 100 amino acids within nsp 3 for ark, ga98 and mass 41, respectively. we also found a virus subpopulation within the ark/ark-dpi/81 strain, which had a ten amino acid deletion in nsp 3 at positions 789-798 similar to the ark-attenuated virus. the catalytic triad of the pl2 protease, amino acids cys623, hys786, asp802 [29] was conserved among all of the viruses, and a hydrophobicity plot of nsp 3 predicted fours transmembrane regions between amino acids 1,000 and 1,300 (data not shown). the fewest amino acid changes for the nsps between pathogenic and attenuated viruses within a serotype are found in nsps 7-10, which are the rna-binding proteins. the polyprotein 1ab-1 frame-shift slippery sequence (5 0 -uuuaaac) is conserved among all six viruses but the location was found at nt 12,328 for ark/ark-dpi/81, nt 12,298 for ark-attenuated, nt 12,321 for ga98/cwl0470/ 98, nt 12,360 for ga98-attenuated, nt 12,391 for mass/ mass41/41 and nt 12,327 for mass41-attenuated. the percent amino acid identity for the s glycoprotein is 97.8% for ark viruses, 96.6% for ga98 viruses, and 97.2% for mass 41 viruses (fig. 3) . the number of amino acid differences within the s glycoprotein between pathogenic and attenuated viruses are 7, 33, and 27 for ark, ga98, and mass 41, respectively ( table 3 ). the s glycoprotein for the ark viruses had 9.86% (0.60 differences/100 amino acids) of all amino acid differences, which is the third most variable orf in the entire genome after nsp 3 and 12. for the ga98 viruses, the s glycoprotein has 13.36% (2.82 differences/100 amino acids) of all amino acid differences, which is the third most variable orf in the entire genome after nsp 3 and orf 6b. the s glycoprotein for the mass 41 viruses has 5.77% of all amino acid differences (2.31 differences/100 amino acids), which was the fourth most variable orf in the entire genome after nsp 3, 2, and 4. orf 3b has the fewest number of differences with no differences observed between the ark viruses, whereas the ga98 and mass viruses each have one amino acid difference. for orf 4b, no amino acid differences are observed for the ark viruses, 16 amino acid differences are observed between the ga98 viruses, and 17 amino acid differences are observed between the mass 41 viruses. the ark virus 6b proteins have only one amino acid mutation and are 99.9% similar to each other, whereas the ga98 virus 6b proteins have 43 amino acid mutations, 3 amino acid deletions, and 1 substitution and are only 41.9% similar. because this protein has not been previously recognized in ibv, a nucleotide blast search rather than an amino acid search was conducted and showed that the ga98/cwl0470/98 virus has 98% identity with mass h120 (fj888351) and the ga98-attenuated virus has 98% identity with ark-dpi (eu418976). to determine whether the ga98-attenuated virus 6b sequence was a subpopulation within the ga98/cwl0470/98 virus, two forward primers (ga98a #1 5 0 -tcacgctcaagttcaagacctg-3 0 , and ga98a #3 5 0 -cagctttaggtgagaatgaact-3 0 ) and two reverse primers (ga98a #2 5 0 -tacgataaaacaa actaatgagaa-3 0 , and ga98a #4 5 0 -ttgataggaa agcacagaaatag-3 0 ) specific for the ga98-attenuated 1m a positions are based on 1ab from tcov (accession number yp_001941164) and presented as the residue position with 1 being the methionine at the beginning of orf 1a and 1ab followed by the single letter code for the amino acid at that position 6b sequence were used in combination in an rt-pcr assay, but no amplicons were observed. the data on pathogenicity of the viruses in 2-week-old spf chicks are presented in table 4 . a birds were given 1 9 10 4 50% embryo infectious doses intraocularly/intranasally and examined for clinical signs, virus, and lesions at 5 days post-inoculation b virus was detected in tracheal swabs by real-time rt-pcr as previously described callison et al. [27] c epithelial hyperplasia, lymphocyte infiltration, and the severity of epithelial deciliation were scored for each trachea with one being normal and four being severe d a representative control group from one of the experiments is presented. all of the data from the negative control groups were the same (fig. 4a) . the ark-attenuated virus, which is adapted to embryonated eggs, only killed chicks inoculated with virus at 1 day of age showed statistical differences (p b 0.1) in the amount of virus detected in the trachea between the ark/ark-dpi/81 and ark-attenuated viruses at 24, 48, 72, and 96 h post-inoculation with the pathogenic ark/ark-dpi/81 having the higher amount of virus at each of the sample times (fig. 4b) . although not statistically different, the chicks given the pathogenic ark/ark-dpi/81 virus also had more virus detected in the trachea than the chicks given the ark-attenuated virus at 12 and 36 h post-inoculation. many studies have examined sequence changes in the structural proteins of ibv and found that most of the changes associated with adaptation to a particular host or with a particular virus pathotype occur in the spike glycoprotein [18, 19, 30] . but only a few studies have examined changes across the entire genome associated with biological characteristics of the virus [22, 31] . ammayappan et al. [22] found a total of 17 amino acid changes between the genomes of ark dpi 11, a pathogenic virus and ark dpi 101 an attenuated virus, with four amino changes in nsp 3 and six amino acid changes in the s1 glycoprotein. based on that data, it was suggested that changes in the replicase sequence in addition to structural proteins might play a role in pathogenicity. fang et al. [31] found 53.06% of all amino acid substitutions across the entire genome were located in the spike glycoprotein following adaptation of an attenuated avian coronavirus to primate cells, suggesting that spike plays a role in host adaptation. in this study, we analyzed the consensus full-length genome for the pathogenic and attenuated viruses of three different ibv types and showed that within a virus type, 34.75 to 43.66% of all the amino acid changes between the pathotypes occurred in nsp 3, whereas changes in spike ranged from 5.8 to 13.4% of all changes. it should be noted, however, that spike had the highest number of differences between different serotypes of the virus, which is consistent with previous reports [5] [6] [7] [8] . a high percentage of differences between pathogenic and attenuated viruses within a serotype in nsp 3 suggests this region plays a key role in pathogenicity. the nsp 3 is a complex protein with multiple domains making it an attractive target for antiviral drug design [9, 32] . it is approximately 1,600 amino acid residues in length and consists of an acidic domain, an adp-ribose 1 phosphatase, the pl2 protease (a deubiquitinating protease), y and transmembrane domains. the acidic domain is of unknown function, however; there is some evidence that it possesses nucleic acid binding activity because it is consistently co-purified with singlestranded rna [33] . previous studies with other organisms indicate that electrostatic interactions from this type of domain play a key role in ligand binding [34] . influenza a viruses also contain a polymerase acidic protein (pa) that is required for the transcription and replication activity of the viral polymerase [34] . differences between pathogenic and attenuated ibv strains within a serotype, including deletions in ark and mass41 viruses, were in and around the acidic domain within nsp 3 (fig. 2) . thus, it is likely that the acidic domain plays a role in attenuation in chickens but the exact function(s) of the amino acids in this domain is unclear. it was interesting that we observed an eight and a three amino acid deletion in the pathogenic virus ga98/cwl0470/98 at positions 901-908 and 950-952, respectively, compared to the ga98-attenuated virus. since sequence insertions are not likely to occur during the attenuation process, the ga98-attenuated virus possibly represents a minor undetected subpopulation in the pathogenic virus, which was selected by passage in embryonated eggs. the adp-ribose-1 phosphatase domain within nsp 3 is relatively conserved between the pathogenic and attenuated strains. this domain has been shown in the beaudette laboratory attenuated strain of ibv not to function as an adp-ribose binding protein [35] . however, the triple glycine sequence that forms part of the adp-ribose binding site (gly47-gly48-gly49), which was not conserved in beaudette, is conserved in all of the viruses sequenced herein [35] . this suggests that the adp-ribose-1 protein may be functional in the pathogenic and attenuated ibv viruses and is consistent with the results of the mass 41-x domain as reported by xu et al. [14] . the adp-ribose-1 phosphatase may be important in pathogenicity of ibv because it has been shown to play a role in adp ribosylation, a post-translational protein modification involved in dna damage repair and transcription regulation [14] . in addition, it was reported that the adp-ribose-1 is dispensable for viral replication in tissue culture, suggesting that this domain is involved in regulation of viral replication rather than the actual replication process [36] . the pl2 domain is a papain-like protease that is responsible for the cleavage of the nsp 2/3 and 3/4 sites. most coronaviruses have two papain-like proteases; however, in ibv the pl1 protease is truncated and is nonfunctional [16] . the structure of the pl2 protease domain was determined to be a ''thumb-palm-finger'' motif [37] . this domain has also been shown to be a potent ifn antagonist by inhibiting the phosphorylation and nuclear translocation of interferon regulatory factor 3 (irf-3) causing a disruption in the activation of the type i ifn response through toll-like receptor 3 (tlr 3) or retinoic acid-inducible gene i (rig-i) [38] . although the catalytic triad of the pl2 protease is conserved, amino acid changes between the pathogenic and attenuated viruses are observed in the pl2 protease, which could affect the efficiency of this ifn antagonist leading to altered viral replication in the cell. the disruption of ifn signaling has been shown in many viral infections, including sars-cov, dengue virus, and paramyxoviruses [39] [40] [41] . the ibv pl2 viral protease was also shown to have characteristics similar to ubiquitin-specific proteases [42] . deubuquitinating proteases, which remove ubiquitin from proteins that have been marked by cellular mechanisms for atp-dependent degradation, could be a potential mechanism by which the virus can alter the cellular environment favoring replication. the y domain, containing transmembrane domains at its n-terminus, was originally described by gorbalenya et al. [43] and has been predicted to consist of three domains y1, y2, and y3, which may act together to form an enzymatic function [32] . the transmembrane domain is inserted into the endoplasmic reticulum (er) membrane co-translationally and plays an important scaffolding role for the replication transcription complex [9] . recently, it was shown that three transmembrane domains were predicted for the sars-cov nsp 3 but only two were found to span the er membrane orienting the protease domain of nsp 3 on the cytoplasmic side where viral replication occurs [13, 15] . in murine hepatitis virus (mhv), five transmembrane domains were predicted but only two domains were found to span the membrane, also locating the protease domain on the cytoplasm side [13, 15] . our sequence data for ibv predicts four transmembrane domains within nsp 3. assuming the protease domain is located on the cytoplasm side of the membrane, we predict that either two or all four transmembrane domains would be used. a chimera ibv containing the replicase genes 1a and 1a/b from the attenuated beaudette strain and the structural genes from the pathogenic mass 41 strain was not pathogenic in chickens, indicating that the replicase proteins appear to be determinants of pathotype in ibv [2, 21] . our data strongly support these studies and further indicate that changes in nsp 3 play a key role in ibv pathotype. it should also be emphasized that pathogenicity in avian coronaviruses is likely polygenic, since we and others [22] observed amino acid substitutions in other viral proteins including spike. the 6b orf detected in tcov (genbank accession numbers acb87503 and acb87504) is identified in ark and ga98 viruses herein. only one amino acid difference was observed between the ark viruses, but 43 differences as well as 3 amino acid deletions and 1 insertion are observed between ga98 viruses. an attempt to identify a subpopulation in the ga98/cwl0470/98 pathogenic virus with the ga98attenuated gene 6b was unsuccessful. it is not clear why gene 6b is so variable between the ga98 viruses but it appears recombination rather than mutations over time may have played a role. a nucleotide blast analysis indicated that the ga98/cwl040/98 virus was 98% similar to mass h120 a vaccine virus and the ga98-attenuated virus was 98% similar to ark-dpi a pathogenic virus, suggesting an origin for those genes. nonetheless, assuming the 6b orf is expressed, it apparently does not play a role in defining pathotype. interestingly, we find differences between pathogenic and attenuated viruses in the 5 0 and 3 0 utrs. the 5 0 and 3 0 utrs play key roles in transcription and replication of coronaviruses [44] . however, the differences between the ark and mass viruses, which are 25 nt and 12 nt, respectively, for the 5 0 utr, and 6 nt and 2 nt, respectively, for the 3 0 utr did not appear to affect replication as determined in embryonated eggs. the trs sequences for generation of the subgenomic mrnas were identical in all of the viruses; however, the leader junction sequences were different for ga98 viruses. different leader junction sequences could be important for attenuation since efficiency of subgenomic mrna production would affect growth of the virus [45] . differences are observed in the amount of virus detected in chickens given viruses with different pathotypes. when the same amount of virus was administered, birds given the attenuated virus compared to birds given the homologous pathogenic virus had less virus detected in the trachea at all sampling times and the difference was statistically significant for most of the time points. thus, it appears that the amount of ibv replication in the trachea correlates with the ability of the virus to cause disease in chickens. attachment and entry, and replication of the attenuated virus (for chickens) were not impaired because it grew to the same titer (with the exception of one time point) as the pathogenic virus in 10-day-old embryonated eggs. inefficient attachment and entry into chicken host cells in vivo could be due to changes in spike. and decreased replication of the attenuated viruses could be due to the inability of the virus to overcome some as yet unidentified innate defense mechanism(s) in chicken cells that is not present in embryonic cells. domains within nsp 3 associated with the deubiquitinating protease or ifn antagonists are likely candidates for further research. in summary, we find that most changes associated with attenuation of ibv for chickens are located within nsp 3 and that the attenuated viruses have reduced replication in chickens but not in 10-day-old embryonated eggs. changes in spike suggest that attachment and entry may have been affected and changes in nsp 3 suggest that the attenuated virus lost the ability to overcome some innate host cell defense mechanism in the mature chicken cell. the exact mechanism(s) surrounding the interaction of virus and host processes affecting virus replication have yet to be determined for ibv, but identifying the sequence changes in the virus responsible for reduced replication and attenuation is an important step in elucidating those mechanisms. finally, changes observed in nsp 3 and spike as well as in other viral genes support the polygenic nature of pathogenicity in avian coronaviruses. infectioius bronchitis epitopes of neutralizing antibodies are located within three regions of the s1 spike protein of infectious bronchitis virus infectious bronchitis, in a laboratory manual for the isolation, identification, and characterization of avian pathogens code of federal regulations, standard requirements for ibv vaccines. animal and plant health inspection service, us national archives and records administration proc. natl. acad. sci. usa 103 acknowledgments this work was supported by usda, csrees award number 2007-35600-17786. the authors appreciate the assistance that was provided by lauren byrd, carey stewart, and joshua jackwood in conducting these studies. key: cord-340438-9q3ic0ye authors: zhang, jianqiang; yim-im, wannarat; chen, qi; zheng, ying; schumacher, loni; huang, haiyan; gauger, phillip; harmon, karen; li, ganwu title: identification of porcine epidemic diarrhea virus variant with a large spike gene deletion from a clinical swine sample in the united states date: 2018-02-21 journal: virus genes doi: 10.1007/s11262-018-1542-7 sha: doc_id: 340438 cord_uid: 9q3ic0ye two genetically different porcine epidemic diarrhea virus (pedv) strains have been identified in the usa: us prototype (also called non-s indel) and s indel pedvs. in february 2017, a pedv variant (usa/ok10240-8/2017) was identified in a rectal swab from a sow farm in oklahoma, usa. complete genome sequence analyses indicated this pedv variant was genetically similar to us non-s indel strain but had a continuous 600-nt (200-aa) deletion in the n-terminal domain of the spike gene compared to non-s indel pedvs. this is the first report of detecting pedv bearing large spike gene deletion in clinical swine samples in the usa. porcine epidemic diarrhea virus (pedv) is the causative agent of porcine epidemic diarrhea (ped) that was first recorded in europe in the 1970s [1, 2] . pedv spread to asia during the 1980s and 1990s and became endemic in pigs in asian countries [3] . in 2010, a severe ped outbreak occurred in china characterized by high morbidity in pigs of all ages and high mortality in neonatal piglets [4, 5] . in 2013, ped outbreaks were reported for the first time in the usa [6] and caused substantial economic losses [7] . subsequently, us-like pedvs were identified in other american countries and also emerged or re-emerged in some asian and european countries [8] . global pedvs exhibit significant genetic diversities. recently, lin et al. [8] proposed to categorize global pedv strains into classical, s indel, emerging north american non-s indel, and emerging asian non-s indel strains. in the usa, at least two genetically different pedv strains have been identified: the highly virulent pedv first identified in april 2013 associated with severe ped outbreaks was referred to as 'us prototype' or 'us original' or 'non-s indel' strain [9, 10] ; a clinically milder pedv variant identified in the usa in january 2014 which was different from the original highly virulent pedv strains, as reflected by insertions and deletions in the spike (s) gene, was designated as 's indel' pedv [10, 11] . in this case report, we describe, for the first time, identification of a pedv variant with a large spike gene n-terminal domain deletion from a clinical swine sample in the usa. at the iowa state university veterinary diagnostic laboratory (isu vdl), a nucleocapsid (n) gene-based real-time rt-pcr (rrt-pcr) is routinely used for the screening detection of pedv from clinical specimens [12] [13] [14] . if positive, a spike gene-based multiplex rrt-pcr can be further used to differentiate non-s indel from s indel pedv strains. in february 2017, rectal swabs collected from a edited by juergen a richt. genbank accession numbers: the complete genome sequences of two porcine epidemic diarrhea viruses described in this study have been deposited in genbank under accessions mg334554 and mg334555. sow farm in oklahoma, usa, were submitted to the isu vdl for pedv pcr testing. the samples were positive for pedv by the n gene-based rrt-pcr. subsequent pedv s gene-based differential rrt-pcr revealed that these samples were negative for s indel pedv (c t > 40) but positive for non-s indel pedv. generally, the pedv s gene-based differential rrt-pcr gave 2-3 c t higher than the n genebased rrt-pcr on the same sample. however, the sample #8 gave unexpected results: strong positive by n gene-based rrt-pcr (c t 15.5) but weak positive for non-s indel pedv by the differential rrt-pcr (c t 36.8). to determine the possible reasons for this observation, the sample #8 and another control sample #6 (c t 18.2 by n gene-based rrt-pcr and c t 20.8 for non-s indel by the s gene-based differential rrt-pcr) were sequenced using next-generation sequencing technology following previously described procedures [15, 16] . the pedv in the sample #6 (usa/ ok10240-6/2017) and the sample #8 (usa/ok10240-8/2017) had whole genome sequences of 28,038 and 27,438 nucleotides in length, respectively. the sequences of these two pedvs have been deposited into genbank (mg334554 and mg334555). phylogenetic analyses based on the whole genome sequences and the spike gene indicated that both ok10240-6 and ok10240-8 belong to the us non-s indel cluster (fig. 1) . however, compared to the ok10240-6 and other non-s indel pedv strains, the ok10240-8 pedv had a large continuous deletion of 600-nt (200-aa) in the spike gene/protein (nt ∆91-690; aa ∆31-230; fig. 2 ). the remaining genome of the ok10240-8 pedv, other than the s deletion region, had approximately 99.7% nt identity to other non-s indel pedv strains. a gel-based rt-pcr [17] was used to differentiate the ok10240-8 pedv from non-s indel pedv. twenty more samples were collected from the same farm; all of them contained non-s indel pedv but none of them contained ok10240-8-like pedv, indicating the prevalence of ok10240-8-like pedv in swine populations may be very low. virus isolation attempts on the sample #8 in vero cells (atcc ccl-81) were unsuccessful. the remaining sample #8 (250 µl diluted in 2250 µl culture medium) was orally inoculated into two 10-day-old pedv-negative piglets (10 ml/pig) but did not result in active infection. pedv spike (s) protein is a type i membrane glycoprotein with a signal peptide (amino acid residues 1-18), a large [18] . the s protein assembles into homotrimers that form the clubshaped projections (spikes) on the virion surface. pedv s protein has multiple functions including (1) mediating receptor binding through its s1 subunit (aa 1-729) and fusion of the viral and cellular membranes during cell entry through its s2 subunit (aa 730-1386); (2) harboring neutralization epitopes. specifically, the n-terminal domain (aa 19-233) exhibits sialic acid binding activity; the receptor-binding domain (aa 501-629) is believed to interact with a protein receptor; and a fusion peptide domain (aa 891-908) mediates virus-cell membrane fusion during cell entry [18] . neutralization epitopes have been reported within the amino acid residues 1-219, 499-638, 636-789, and 1371-1377 [19] [20] [21] [22] . the aminopeptidase n protein (apn) serves as a receptor for several alphacoronaviruses such as canine coronavirus type ii, feline coronavirus type ii, transmissible gastroenteritis virus (tgev), porcine respiratory coronavirus (prcv), and human coronavirus 229e [18] . porcine apn was considered to be the putative receptor of pedv with some supporting evidence [23] [24] [25] [26] ; however, some recent studies indicate that porcine apn may not be a functional receptor for pedv [27, 28] . the n-terminal domain of pedv s protein is one of the most variable regions in the pedv genome. the insertions and deletions of s indel pedv strains and the large deletion (aa ∆31-230) of the pedv variant (ok10240-8) identified in this study are all located in the n-terminal domain region. it is predicted that deletion of 200-aa at this region would not interfere with either the protein receptor binding or the neutralization epitopes 499-638, 636-789, and 1371-1377. 3-d structural analyses of the s protein also suggest that this 200-aa deletion may not interfere with trimer formation. however, the neutralization epitope within residues 1-129 and the sialic acid binding activity of the virus may be affected by this 200aa deletion. in fact, some studies have shown that pedv strains having variations in the n-terminal domain of s protein exhibited different sialic acid binding activities [18, 29] . it remains to be determined whether the activity of sialic acid binding by the s protein affects virus entry into cells, replication in cells, and pathogenicity in pigs. in addition, 200-aa deletion in the ok10240-8 pedv variant may affect virus virulence and pathogenicity. construction of a recombinant pedv carrying 200-aa deletion in this it was previously reported [30] that a cell cultureadapted us pedv isolate tc-pc177-p2 contained 591nt (197-aa) deletion in the s protein (aa ∆34-230) but such deletions were not present in the original clinical sample oh/pc177/2013 (fig. 2) . a japanese pedv strain tottori2/2014, identified in a clinical sample, contained 582-nt (194-aa) deletion in the s protein (aa ∆23-216) [31] . a korean pedv strain mf3809/2008, identified in a clinical sample, contained a 612-nt (204-aa) deletion in the s protein but in a different location (aa ∆713-916) [32] . a recent study reported the coexistence of pedv with a large s gene deletion and pedv with intact s gene in domestic pigs in japan [33] . it has been demonstrated that the usa/ tc-pc177-p2 and jpn/tottori2/2014 isolates harboring a large s gene deletion are less virulent than non-s indel pedvs in experimentally inoculated pigs [8, 29, 34] . in terms of tgev, a large (224-aa) deletion in the spike gene changed the viral tropism from intestinal to respiratory and this tgev mutant was later renamed as prcv [35] . in contrast, the pedv variant tc-pc177-p2 with large s gene deletion did not change intestinal tropism [8, 29] . in summary, a new pedv variant strain (usa/ ok10240-8/2017) belonging to the non-s indel cluster but with a 600-nt deletion (200-aa deletion) in the n-terminal domain of the s gene was identified in this study. this is the first report of a pedv strain with a large deletion in the s gene identified in clinical swine samples in the usa. this pedv with large s gene deletion was present on the same farm where non-s indel pedv with intact s gene was detected but it appeared that the prevalence of ok10240-8-like pedv in swine populations may be low. additional molecular epidemiological studies are needed to monitor the emergence of novel pedv variants and determine their prevalence levels in us swine. pig farm acknowledgement this study was supported by the iowa state unikey: cord-344558-1jgqofbr authors: kocherhans, rolf; bridgen, anne; ackermann, mathias; tobler, kurt title: completion of the porcine epidemic diarrhoea coronavirus (pedv) genome sequence date: 2001 journal: virus genes doi: 10.1023/a:1011831902219 sha: doc_id: 344558 cord_uid: 1jgqofbr the sequence of the replicase gene of porcine epidemic diarrhoea virus (pedv) has been determined. this completes the sequence of the entire genome of strain cv777, which was found to be 28,033 nucleotides (nt) in length (excluding the poly a-tail). a cloning strategy, which involves primers based on conserved regions in the predicted orf1 products from other coronaviruses whose genome sequence has been determined, was used to amplify the equivalent, but as yet unknown, sequence of pedv. primary sequences derived from these products were used to design additional primers resulting in the amplification and sequencing of the entire orf1 of pedv. analysis of the nucleotide sequences revealed a small open reading frame (orf) located near the 5′ end (no 99–137), and two large, slightly overlapping orfs, orf1a (nt 297–12650) and orf1b (nt 12605–20641). the orf1a and orf1b sequences overlapped at a potential ribosomal frame shift site. the amino acid sequence analysis suggested the presence of several functional motifs within the putative orf1 protein. by analogy to other coronavirus replicase gene products, three protease and one growth factor-like motif were seen in orf1a, and one polymerase domain, one metal ion-binding domain, and one helicase motif could be assigned within orf1b. comparative amino acid sequence alignments revealed that pedv is most closely related to human coronavirus (hcov)-229e and transmissible gastroenteritis virus (tgev) and less related to murine hepatitis virus (mhv) and infectious bronchitis virus (ibv). these results thus confirm and extend the findings from sequence analysis of the structural genes of pedv. porcine epidemic diarrhoea virus (pedv) is a causative agent for diarrhoea in pigs, particularly in neonates. the disease has been recognised for approximately thirty years, but the causative virus was only first described in 1978 [1] , while another ten years elapsed before a method was developed for propagation of the virus in cell culture [2] . during this time, outbreaks of the disease were reported from numerous european countries as well as korea, china and japan. the epidemiology and pathogenesis of the disease have been well described by pensaert [3] . the biological behaviour, electron microscopic appearance and polypeptide structure of pedv resulted in its provisional classification as a coronavirus [2, 4, 5] . coronaviruses belong to the taxonomic order of nidovirales and contain a single stranded rna genome of positive polarity, which is approximately thirty kilobases in length. the genes encoding the structural proteins are located at the 3 0 end of the genome. an astonishing two-thirds of the genome consist of the replicase gene, which is located at the 5 0 end of the genome. the replicase proteins are encoded by orf1a and orf1b. these two long, slightly overlapping orfs are connected by a ribosomal frame shift site in all coronaviruses sequenced to date. this regulates the ratio of the two polypeptides encoded by orf1a and the readthrough product orf1ab. about 70±80% of the translation products are terminated at the end of orf1a, and 20±30% continue to the end of orf1b. the polypeptides are post-translationally processed by viral encoded proteases [reviewed by 6]. these proteases are encoded within orf1a; the polymerase-and the helicase-function are encoded by orf1b. we have previously completed the sequencing of the nucleocapsid-(n), membrane-(m), small membrane-(e), orf3 and spike-(s) genes of the pedv strain cv777 [7±9]. the alignment of the deduced amino acid sequences indicated that pedv occupies an interesting intermediate position between the two well-characterized members of the group i coronaviruses, transmissible gastroenteritis virus (tgev) and human coronavirus (hcov)-229e. in this study, we have continued to determine and analyse nucleotide sequences of pedv. to our knowledge, only two group i coronaviruses have been sequenced completely, hcov-229e and tgev [10, 11] . in addition, two strains of mouse hepatitis virus (mhv), jhm and a59 belonging to the group ii coronaviruses, and infectious bronchitis virus (ibv) have been completely sequenced [12±15] . therefore, the sequence presented in this paper is the sixth sequence of a coronavirus covering the entire genome. growth of cell adapted pedv strain cv777 was performed essentially as has been described elsewhere [2, 8] , except that virus-infected cells were harvested at approximately 18 h post infection. cells were freeze-thawed three times and cell debris removed by low speed centrifugation. virus was pelleted by centrifugation for 2 h at 22,000 rpm and 4 c in a sw28 rotor of a beckman centrifuge. virus pellets prepared from two 175 cm 2 flasks were pooled and resuspended in 1 ml trizol tm (gibco-brl), and rna was prepared as recommended by the manufacturer. in order to obtain the first partial pedv specific sequences, the predicted amino acid sequences of the hcov-229e and tgev polymerase orfs were aligned and homologous regions identified. the homologous regions were used to design degenerate primers [9] that were used for rt-pcr amplifications. these initial amplicons were cloned and sequenced [9] . later, a mixture of up to six antigenome sense primers based on pedv specific sequences or the degenerate primers and random hexamer primer (purchased from schmidheini ag; balgach, switzerland) was used for first strand cdna synthesis. rna prepared from two 175 cm 2 flasks of virus-infected cells was denatured for 10 min at 65 c and first strand cdna was performed in a 20 ml total reaction volume using superscriptii tm (gibcobrl; basel, switzerland) according to the manufacture's protocol. this was modified to create the longer reverse transcription products by including a denaturation step at 95 c for 5 min following the first 1 h incubation at 42 c, followed by the addition of 1 ml superscriptii tm and a second prolongation step of 1 h at 42 c. template rna was digested by adding 1 ml rnaseh (gibcobrl; basel, switzerland) to the reaction mix and incubating at 37 c for 20 min. pcr amplification was performed as described elsewhere. in brief, pfu dna polymerase (stratagene; basel, switzerland) was used for the amplifications, which were performed on a dna engine (mj research) machine. pcr fragments were subsequently cloned into pbluescript 1 ii ks or puc19 vectors using standard procedures. the nucleotide sequence was determined on these cdna clones. direct sequencing was performed on a rt-pcr product (see fig. 1b ), which was cleaned through an agarose gel. the contigs of the sequence determinations were constructed using seqman (dna*, lasergene, madison wi, usa). we previously reported the determination of the pedv leader sequence on the mrna encoding the n-gene [16] . this sequence was used for the primer design in order to amplify the 5 0 end of the genome. the leader sequence was used for the in silico construction of the genomic rna sequence, which is available on genbank database (accession number af353511). virus sequences covering replicase genes were obtained from the genembl sequence database. the files with the accession numbers x69721, z34093, af029248, and m95169 for hcov-229e, tgev (purdue 115), mhv-a59, and ibv (beaudette) respectively were used. the deduced amino acid sequences were compared as indicated in the text using pileup and gap (gcg package version 10.0; madison, wi, usa). the files generated by pileup were used in distances (gcg package version 10.0; madison, wi, usa) to determine the kimura protein sequence distances, which were subsequently used for the construction of unrooted dendrogram using treegen on the cbrg server (http://cbrg.inf.ethz.ch/) the cloning approach we used previously to clone the pedv m and n genes involved designing primers based on conserved regions of the coronavirus m and n genes to amplify the equivalent to the unknown pedv sequence. in this study, we employed this technique to clone parts of the orf1 of pedv. such a method is useful for viruses which do not grow to high titre, avoids lengthy screening of clones and could potentially be applied to the cloning of any group i coronavirus. however, the large size of orf1 and the paucity of sequence data from other coronaviruses made this an ambitious objective. a number of conserved functional domains were identified in the predicted orf1 products, but these domains are mainly located in the orf1b region and leave large regions of the orf1a product with no known function and only a low level of sequence conservation between different coronavirus genomes. in order to clone and determine the sequences for the pedv orf1, the predicted amino acid sequences of the hcov-229e and tgev orf1 were aligned and homologous regions identified. the hcov-229e and tgev orfs were sufficiently closely related to allow complete alignment of the predicted expression products. in contrast, the mhv and ibv sequences were much more divergent, and could only be aligned with the group i sequences in some of the conserved regions. degenerate primers were designed from regions conserved between the hcov-229e and tgev and, where possible, mhv and ibv orf1. these primers were used both to prime reverse transcription and for the pcr amplifications. sequence data derived from these pcr products allowed us to design sequence-specific primers which were then used to amplify the entire orf1 (see fig. 1b ). numerous small cdna clones, five large cdna clones and one rt-pcr product covering the 5 0 twothirds of the pedv genome were used to determine the nucleotide sequence of the pedv orf1 (fig. 1 ). this analysis completes the nucleotide sequence of pedv, and thereby the sixth entire sequence determined from a coronavirus genome [10±13,15] . the genome of pedv (cv777) excluding the poly a-tail is 28033 nt in length. analysis of the newly determined nucleotide sequence revealed a pattern of orfs typical of coronaviruses. a small orf with the potential to code for a 12-amino acid peptide was found at the 5 0 end of the genome from nucleotide position 99±137. such small orfs (uorfs) are present in all coronaviruses sequenced so far. the uorfs of hcov-229e [17] and ibv [15] are found to be eleven codons in length, while that of mhv is eight codons long [18, 19] . that of tgev can only encode a three-amino acid peptide [20] . two long orfs of 12354 and 8037 nt, which overlap by 46 nt, covered most of the newly determined sequence. by analogy to published coronavirus sequences [15, 17, 20] , the orfs were designated orf1a and orf1b. the predicted orf1a of fedv extended from nucleotide 297 to 12650. this resulted in a 4117-codon orf. the overlapping orf1b starting at nucleotide 12605 and ending at nucleotide 20641 had the capacity to code for 2678 amino acids. it has been proposed for coronaviruses and other members of the order nidovirales [21] that the nucleotide sequences in the overlapping regions of orf1a and orf1b are able to fold into a pseudoknot tertiary structure [22, 23] . this region allows the ribosome shifting of the reading frame during translation of the orf1a and subsequently continues the translation in orf1b. the function of these rna structures as ribosomal frame shift sites was demonstrated for the analogous sequences of ibv [24] and hcov-229e [25] . it seems likely that the translation of the pedv orf1b is mediated by such a ribosomal frame shifting. the nucleotide sequences of pedv, hcov-229e, and tgev covering the ribosomal frame shift site are more conserved to each other than to mhv-a59 or ibv. in order to identify the sequence which could be involved in the formation of the tertiary structure, the nucleotide sequences covering the end of orf1a and the beginning of orf1b from hcov-229e [25] and tgev [20] were aligned with the corresponding sequence of pedv. fig. 2a shows the predicted frame shift region of pedv based on this comparison. the so-called slippery site (uuuaaac) at which frame shifting occurs is identical in all coronaviruses sequenced so far. the stems and loops required to provide the tertiary structure of the frame shift regions of tgev and hcov-229e were compared and fig. 2b shows the predicted tertiary structure required for the frame shift of pedv based on this comparison. pairwise comparison of the deduced amino acid sequences (using gap) revealed that orf1b of pedv is more conserved than orf1a to corresponding sequences of other coronaviruses. the percentage of similarities and identities is shown in table 1 . the putative protein sequence of orf1a was most similar to the sequence of orf1a of hcov-229e (59.4%) and less similar to the corresponding orf1a of tgev (52.1%), mhv-a59 (39.5%) and ibv (38.7%). the same relationship, but at a higher level of similarity, was true for the deduced amino acid sequence of the predicted pedv orf1b. it was most similar to the amino acid sequence of hcov-229e orf1b and tgev orf1b (83.2% and 80.3%, respectively). the similarity to the orf1b from mhv-a59 and ibv was around 64%. the deduced amino acid sequences of orf1a and orf1b from pedv were aligned with the corresponding sequences of hcov-229e, tgev, mhv-a59, and ibv using pileup. the degrees of amino acid homologies are graphically presented as dendrograms (fig. 3a,b) . the multiple sequence alignments revealed several putative functional domains common to coronavirus sequences [23, 26] located on the deduced amino acid sequence of orf1ab of pedv. some of these had been used to design the primers for the rt-pcr amplification. in the orf1a region the following motifs were observed. two motifs indicative of papain-like proteases (plp) were present at amino acid positions 1077±1266 and 1716±1917. the plp motif is found twice in the replicase genes of hcov-229e, tgev and mhv, but only once in that of ibv. in this respect, pedv resembles hcov-229e, tgev and mhv rather than ibv. a highly conserved region (x-domain) was found between the two plp motifs. despite this motif being present in all coronavirus sequences, its function is not yet known. a picornavirus 3c-like (3c1) protease domain is located between amino acids 2998 and 3299 of the pedv orf1a. all corona-and arteriviruses encode this motif, which is the main protease for the coronavirus mediated processing of the polyproteins. three markedly hydrophobic domains conserved among coronaviruses are found in orf1a. the first is located after the second plp motif and the others flank the 3cl motif. finally, a growth factorlike (gfl) domain was located close to the end of orf1a (amino acid position 3965±4000). in the orf1b region, three structural protein motifs could be recognized, which all play a role in viral replication. a sub-sequence at amino acid position 4636±4939 containing the characteristic tripeptide orf1 of pedv sdd (or gdd in most rna viruses) [26] is probably the active site for the rna dependent rna polymerase. a metal ion-binding domain covering amino acids 5027±5103 and a helicase motif at amino acid positions 5309±5624 were also observed in the pedv orf1b product. alignments of the deduced amino acid sequences of the 3cl protease and the polymerase motif from five different coronaviruses are shown in fig. 4a and 4b, respectively. the findings concerning conserved domains are summarised in fig. 1a . a deletion of about 180 amino acids located between the x-domain and the second plp motif in the putative orf1a sequence of tgev compared to that of hcov-229e was reported by eleouet et al. [20] . this additional sequence was present in the pedv orf1a product. the alignment (using gap) of the hcov-229e and pedv amino acid sequences revealed 42.5% similarity and 31.5% identity in this region. earlier sequence analysis of pedv based on the structural protein sequences has shown that pedv is most closely related to hcov-229e and tgev [ 7± 9,27] , less related to mhv-a59, and least related to ibv. however, it was not possible to determine the relative similarities of hcov-229e, tgev and pedv. in this study, the similarities and identities of the amino acid sequence alignments based on orf1a and orf1b show clearly that pedv is most closely related to hcov-229e and, moreover, that hcov-229e is more similar in sequence to pedv than it is to tgev. in addition to the sequence analysis, the presented work offers various possibilities for future research on coronaviruses. functional analysis and processing of the as yet uncharacterised pedv orf1 is now possible. recently, almazan et al. and yount et al. achieved the generation of infectious tgev from cdna [28, 29] and thiel et al. suceeded in generating full length cdna clones of hcov-229e and ibv in a recombinant vaccinia virus system [30] . the sequence and the cdna clones covering the entire genome of pedv would allow the development of a mini-genome system to study viral replication or the generation of an assembled, infectious cdna clone. bearing in mind the close relationship of pedv and hcov-229e, the latter approach could be used to exchange functional parts of these viruses to gain new insights into the biology of these viruses. furthermore, the porcine epidemic diarrhea virus virus infections of porcines a reverse genetic system for coronaviruses the authors thank christa meyer for excellent technical assistance. these studies were supported by the swiss national science foundation, grant #31-43503.95. key: cord-011794-ejoufvvj authors: binder, florian; reiche, sven; roman-sosa, gleyder; saathoff, marion; ryll, rené; trimpert, jakob; kunec, dusan; höper, dirk; ulrich, rainer g. title: isolation and characterization of new puumala orthohantavirus strains from germany date: 2020-04-23 journal: virus genes doi: 10.1007/s11262-020-01755-3 sha: doc_id: 11794 cord_uid: ejoufvvj orthohantaviruses are re-emerging rodent-borne pathogens distributed all over the world. here, we report the isolation of a puumala orthohantavirus (puuv) strain from bank voles caught in a highly endemic region around the city osnabrück, north-west germany. coding and non-coding sequences of all three segments (s, m, and l) were determined from original lung tissue, after isolation and after additional passaging in veroe6 cells and a bank vole-derived kidney cell line. different single amino acid substitutions were observed in the rna-dependent rna polymerase (rdrp) of the two stable puuv isolates. the puuv strain from veroe6 cells showed a lower titer when propagated on bank vole cells compared to veroe6 cells. additionally, glycoprotein precursor (gpc)-derived virus-like particles of a german puuv sequence allowed the generation of monoclonal antibodies that allowed the reliable detection of the isolated puuv strain in the immunofluorescence assay. in conclusion, this is the first isolation of a puuv strain from central europe and the generation of glycoprotein-specific monoclonal antibodies for this puuv isolate. the obtained virus isolate and gpc-specific antibodies are instrumental tools for future reservoir host studies. electronic supplementary material: the online version of this article (10.1007/s11262-020-01755-3) contains supplementary material, which is available to authorized users. puumala orthohantavirus (puuv) is the most important hantavirus in europe [1] . it causes the majority of human hantavirus infections and hemorrhagic fever with renal syndrome (hfrs) cases [2] . in central and western europe hantavirus outbreaks occur in two to five year intervals and are driven by massive increase of the bank vole (myodes glareolus) population, the reservoir of this orthohantavirus species [3] . human hantavirus disease is notifiable in germany since 2001 and the majority of recorded cases is mainly due to puuv infections in southern and western parts of germany, whereas dobrava-belgrade orthohantavirus (dobv) with the striped edited by detlev h. kruger. the online version of this article (https ://doi.org/10.1007/s1126 2-020-01755 -3) contains supplementary material, which is available to authorized users. field mouse as reservoir causes infections in the northeastern part of germany [3] . the characterization of the pathogenicity and identification of virulence markers are highly dependent on adequate puuv isolates. currently, the number of puuv isolates is very limited and does not represent the real diversity of puuv strains in europe. in particular, no central european puuv isolate exists [4] . the majority of puuv isolates, and hantaviruses in general, was obtained based on passaging in reservoir animals or veroe6 cells and is highly adapted [5] [6] [7] . previous investigations indicated that veroe6 cell adaptation of puuv kazan strain results in the inability of the adapted strain to infect the bank vole reservoir [8] . the recent development of bank vole-derived primary or permanent cell lines may allow the isolation of reservoir-adapted puuv strains [9] [10] [11] [12] . hantavirus proteins are usually detected in infected cells by monoclonal antibodies. nucleocapsid (n) protein-specific monoclonal antibodies have been developed against a large range of hantaviruses [13] [14] [15] . in contrast, the number of glycoprotein precursor (gpc), as well as gc-and gn-specific monoclonal antibodies is rather low [16] [17] [18] . the majority of these antibodies were raised by infection of bank voles or immunization with recombinant n protein or heterologous virus-like particles (vlps). the generation of envelope protein-specific monoclonal antibodies with reactivity to virus proteins in infected cells is highly dependent on structural constraints [19] . autologous vlps represent a useful tool to generate highly efficient immune responses against a variety of viruses and for the generation of monoclonal antibodies in particular [20] . puuv strain astrup [21] gpc-derived vlps were generated in this study as previously described for maporal orthohantavirus [22] . lower saxony, north-west germany, and district osnabrück in particular, is a well-known endemic region for puuv infections [23, 24] . this endemic region was also again heavily affected by the hantavirus outbreak year 2019 [25] . here, we aimed to isolate a central european puuv strain from bank voles in the district of osnabrück using standard veroe6 cells and the recently established carpathian lineage bank volederived kidney cell line (mgn-2-r [10] ). complete genome determination by shot-gun and hybrid-capture-mediated highthroughput sequencing (hts) was used to follow the potential adaptation of the puuv isolates in veroe6 and reservoir cell lines. finally, the reactivity of the isolates was determined with novel monoclonal antibodies raised against puuv gpc vlps. bank voles were trapped in spring 2019 in the puuv endemic region around osnabrück following a standard snap trapping protocol [25, 26] . in the field, a small piece of lung was taken for virus isolation and rt-qpcr analysis. thereafter, carcasses were frozen, transported to the laboratory and completely dissected according to standard protocols. chest cavity lavage was collected by rinsing the chest cavity by 1 ml phosphate-buffered saline (pbs) and investigated for the presence of puuv-reactive antibodies. the presence of hantavirus rna was analyzed from lung tissue and were, in part, previously published in a surveillance study [25] . for virus isolation and further infection studies, veroe6 and bank vole kidney (mgn-2-r; [10] ) cells were used in parallel. virus titration was done on veroe6 cells only. mgn-2-r cells were grown in an equal mixture of hams' f12 and iscove's modified dulbecco's medium (imdm) + 10% fetal calf serum (fcs) and passaged two times per week at a 1:6 ratio. veroe6 cells were passaged twice a week in minimal essential medium (mem) + 10% fetal calf serum (fcs) and a split ratio of 1:4. for virus isolation, 1 × 10 5 mgn-2-r or veroe6 cells were seeded in 12.5 cm 2 flasks one day before rodent sampling in the field. the cells were carried to trapping sites in an isolation box with heat packs (around 33 °c constant for 2 days with outside temperature of 5-10 °c). after collecting voles from traps, a small incision in the chest area was made and a piece of lung (pea-sized) was taken and transferred into 1 ml dulbecco's modified eagle's medium (dmem) + 5% fcs + penicillin/streptomycin (ps) in a 5 ml safe lock tube. lung tissue material was homogenized in the field by grinding it through a fine metal grid against the tube wall. the homogenized tissue material was sterile filtered (0.45 µm) directly onto the cells resulting in approximately 500 µl tissue/medium suspension per 12.5 cm 2 flask. after 1-2 h incubation in the isolation box, 4 ml dmem + 5% fcs + ps was added. upon arrival in the laboratory flasks were incubated in a cell culture incubator at 37 °c and 5% co 2 for 10 days until first passage. in parallel, a pinhead-sized piece of lung was taken for rna isolation in 1 ml trizol (qia-gen, hilden, germany). after 10 days, trypsinized cells were resuspended in 2 ml dmem + 5% fcs + ps. for puuv rna screening, 325 µl of each cell suspension was taken for rna extraction and analyzed by rt-qpcr (see below). fresh veroe6 cells were resuspended in 2 ml dmem + 5% fcs + ps and 200 µl were mixed 1:1 with 200 µl of the inoculated cell suspension in a new 12.5 cm 2 flask. afterwards, 4 ml dmem + 5% fcs + ps were added and cells were incubated for 10 days until next passage. in parallel, one uninfected flask of veroe6 or mgn-2-r cells was passaged as a control. this procedure was continued until rt-qpcr-positive samples were detected. after first screening, only the flasks of the rt-qpcr-positive samples were further passaged. for detection of puuv nucleic acid, rna was extracted from homogenized lung tissue, or cell culture passages using qiazol lysis reagent (qiagen, hilden, germany) followed by a novel puuv s segment-specific rt-qpcr. for rt-qpcr, primers puuv-nss-s (5′-gwnata rcy cgy cat garc-3′) and puuv-nss-as (5′-art gct gac act gty tgt tg-3′) and the probe (5′-6-fam-crg tgg rrrt-gkacc crg atga-bhq-1-3′) were used. the pcr was done according to the quantitect probe one-step rt-qpcr mix (qiagen, hilden germany) protocol and contained 20 pmol/µl of each primer and 5 pmol/µl probe (eurofins, hamburg, germany). the following cycler protocol was used: 30 min of reverse transcription at 50 °c; 15 min initial denaturation at 95 °c; 45 cycles of 10 sec at 95 °c, 25 sec at 50 °c and 25 sec at 72 °c. for quantification of the number of rna copies/µl and sample, an in vitro transcribed rna was used. the in vitro transcription of a plasmid coding for nucleotides 83-355 of the s segment of a puuv strain from baden-wuerttemberg (binder et al., unpublished) was done according to the protocol of the manufacturer (riboprobe® in vitro transcription system t7, promega gmbh, mannheim, germany). the transcribed rna was serially diluted from 10 -2 to 10 -11 ng/ml with 700 rna copies/µl limit of detection (lod). initial tissue samples were screened for puuv rna and viral load as rna copies/µl was determined in triplicates for organs of isolated positive animals. rna from the cell culture adapted strains puuv sotkamo and tulv moravia were used as positive and negative control for the rt-qpcr, respectively. for metagenomics, we extracted rna from either a pinheadsized piece of lung tissue or 250 µl cell culture supernatant using 750 µl qiazol lysis reagent (qiagen, hilden, germany) in combination with rneasy mini kit (qiagen, hilden, germany). for generation of complete genomes of cell culture supernatants, a previously published workflow was used [27] . double-stranded, non-directional cdna libraries from lung tissue for sequencing on the illumina platform were prepared from total rna using the nebnext ultra ii rna library prep kit for illumina (new england biolabs, ipswich, ma, usa). per reaction, a total of 100 ng rna was used as an input. rna was fragmented for 8 min and final cdna libraries were amplified by 8 cycles of pcr to complete adapter ligation and to generate enough material for target sequence enrichment. a custom-made mybaits target capture array (arbor biosciences, ann arbor, mi, usa), containing biotinylated rna probes against all available puuv sequences deposited in ncbi genbank database (august, 2018), was employed to capture puuv-containing sequences from total cellular cdna sequencing libraries. the hybridization-based sequence enrichment (chemistry v3) was performed according to the manufacturer's instructions (arbor biosciences, ann arbor, mi, usa). the enriched cdna sequencing libraries were amplified with 14 pcr cycles to produce enough dna material for hts on the illumina platform. the enriched cdna libraries were quantified with the nebnext library quantification kit (new england biolabs, ipswich, ma, usa), pooled in equimolar amounts, and sequenced with a 600 cycle miseq reagent kit v3 (illumina, san diego, ca, usa) using paired-end sequencing (2 × 300 cycles) on a miseq sequencer (illumina, san diego, ca, usa). the resulting reads were trimmed and assembled against the known complete genome of strain astrup from the osnabrück region [21] with geneious r11.1.5 (https ://www.genei ous.com). for sequences lacking the 5′ and 3′ ends of the m segment, rna ligation was done using t4 rna ligase (thermo fisher scientific, waltham, ma, usa) and subsequent in vitro transcription with a first strand cdna synthesis kit (thermo fisher scientific, waltham, ma, usa). sequences were obtained by conventional dideoxy-chain termination sequencing after pcr with primers puuv os m2 fwd-5′ tga ggg caa tta tta tgt aa 3′ and puuv os m2 rev 5′ cca att gta tgt ggg cat tcc 3′. the obtained sequences were deposited at gen-bank, accession numbers mn639737-mn639763. phylogenetic trees were reconstructed with four novel and 18 published concatenated s, m, and l coding sequences or 202 partial s segment sequences of 365 nucleotides length. published sequences of other hantaviruses were obtained from genbank. analysis was performed by bayesian algorithms via mrbayes v.3.2.6 (https ://sourc eforg e.net/proje cts/ mrbay es/files /mrbay es/) on the cipres online portal [28] . a mixed nucleotide substitution matrix was specified in 4 independent runs of 10 7 generations. phylogenetic relations are shown as a maximum clade credibility phylogenetic tree with posterior probabilities for major nodes. for immunofluorescence assay (ifa), veroe6 and mgn-2-r cells were inoculated with 500 µl puuv osnabrück/v29 or puuv osnabrück/m43 supernatant in dmem + 5% fcs as described previously [10] . infected cells were fixed 10 days 1 3 post infection with a 1:1 mixture of acetone and methanol for 20 min at − 20 °c. after fixation cells were dried, re-hydrated with phosphate-buffered saline (pbs) and incubated with nucleocapsid (n) protein-specific antibody 5e11 [13] diluted 1:1000 in pbs for 1 h at room temperature (rt). a secondary anti-mouse alexa fluor 488 conjugated antibody (abcam, cambridge, uk) was used for detection of hantavirus proteins. nuclei were stained with 4′,6-diamidino-2-phenylindole (dapi, thermo fisher scientific). for titration studies of puuv, mgn-2-r and veroe6 cells were inoculated with 500 µl of the puuv osnabrück/v29 or puuv osnabrück/m43 virus isolate and passaged three times as described above. supernatants of both cell lines were collected after passage three and frozen at − 80 °c. subsequently, supernatants were serially diluted from 10 -1 to 10 -7 in dmem containing 5% fcs in a 96-well plate with three replicates each. a volume of 100 µl of each dilution was added to 24 h old cell monolayers of veroe6 cells in a 96-well plate. after incubation for 10 days, the virus titer was calculated using ifa for puuv n protein detection as described above. titers were calculated as 50% tissue culture infectious dose (tcid 50 )/ml by the spearman/kärber method [29] and mean titers of three experiments are given. titers after isolation (passage 3 of original lung tissue-derived sample) were used for comparison. for expression and generation of vlps in hek293 cells, a codon-optimized synthetic gene of the puuv gpc of the strain astrup [21] was purchased (geneart, regensburg, germany). the gene encoding the glycoproteins was pcr amplified using primer pair o grs 101/o grs 102 (aat-taaggt acc tcc aga ggc gac acc cgg aacc and aattattaag ctt tca ggg ctt gtg ttc ttt gg) and the pcr product and the acceptor vector phan-1 (roman-sosa, unpublished) were digested with the restriction endonucleases kpni and hindiii. the expression plasmid phan-2 was generated by standard molecular biology protocols. in this plasmid, the endogenous signal sequence of the puuv gn is substituted by the igg-light chain signal sequence and a double strep-tag with a glycine/serine-rich linker between the tags. then a permanently transfected hek293 cell line was generated upon transfection of the cells and selection in the presence of geneticin at 0.5 mg/ml. the vlps were affinity purified from the cell supernatants essentially as described [22] . recombinant vlps were used for five immunizations of four weeks apart of female balb/c mice. hybridoma cells producing monoclonal antibodies (mabs) were generated by standard fusion procedure [30, 31] and screened using a 2 µg/ml stock solution of vlps according to an in-house elisa protocol [32] and buffers without tween. resulting mabs were analyzed by ifa and western blot test for their reactivity to puuv osnabrück/v29, puuv sotkamo, puuv vranica and tulv moravia. veroe6 cells were infected with puuv osnabrück/v29, puuv sotkamo, puuv vranica or tulv moravia at moi 0.1 in dmem + 5% fcs. cells were harvested 10 (puuv osnabrück/v29, sotkamo) or 3 (puuv vranica, tulv moravia) days post infection in sds sample buffer (62.5 mm trishcl ph 6.8, 2% sds,10% glycerol, 6 m urea, 0.01% bromophenol blue, 0.01% phenol red) and proteins were separated by sds page, blotted onto polyvinylidenfluorid (pvdf) membranes. after blocking, the membranes were cut into strips and incubated over night with the antibodies 2e10 (1:1), 5f12 (1:1), 3b12 (1:200), 5b8 (1:1), 5h1 (1:1), 4g10 (1:100), 1b12 (1:2), 1g9 (1:100), 8g4 (1:50), 1h7 (1:1), 2h11 (1:5) or n protein-specific antibody 5e11 (1:1000, [13] , all diluted in pbs-tween 0.05%) at 4 °c. a horseradish peroxidase (hrp) labeled secondary goat anti-mouse igg antibody diluted 1:3000 in pbs-tween 0.05% (bio-rad, hercules, ca, usa) was used for detection of hantaviral proteins. a rabbit anti-β-tubulin antibody (abcam, cambridge, uk) was used as a loading control. investigation of chest cavity lavage samples from bank voles was done by igg elisa using recombinant puuv strain bawa n protein, as described earlier [32] . the monoclonal antibody 5e11 was used as a positive control [13] , chest cavity lavage of a igg elisa-and rt-pcr-negative bank vole was used as negative control. chest cavity lavage samples with an optical density (od) value below the lower cut-off value were considered as negative. positive and doubtful samples were retested a second time. when the od value of the elisa was in a range between the lower and upper cut-off value defined according to our standard protocol [32] , animals were considered doubtful. when the od value was above the upper cut-off value, the samples were considered as positive. rodent trapping at five sites from april 11th to 12th, 2019 in the osnabrück region resulted in the collection of 57 bank voles [25] . dissection on site and inoculation of veroe6 and bank vole mgn-2-r cells with homogenized lung samples resulted after three blind passages in four potential isolates that were detected by a novel puuv rt-qpcr (table s1 , fig. 1) . two of the potential candidates showed only low levels of puuv rna and were not able to consistently infect further passages (m52, m62). quantification by rt-qpcr analysis of different tissues from these four bank voles confirmed lung tissue for most of the samples as having the highest puuv rna load, although it was detected in almost all other tissues investigated (fig. s1 ). rt-qpcr investigation of lung tissues of all 57 bank voles resulted in the detection of hantavirus rna in 44 animals (tables 1, s1, [25] ). puuv rna-positive animals originated from all five trapping sites. serological analysis of chest cavity lavages detected puuv n protein reactive antibodies in 24 of 57 bank voles (tables 1, s1). five additional animals, positive for puuv rna, were found to be equivocal in our serological test. all 24 antibody-positive animals were also found to be puuv rna positive, indicating a high number of persistently infected voles. fifteen additional bank voles were only positive for puuv rna, but not for anti-puuv antibodies, indicating a high number of acutely infected animals in spring in this region (table 1) . interestingly three of the four potential isolates originated from seronegative bank voles (table s1 ). two isolates (osnabrück v29 and osnabrück m43) were obtained by passaging in veroe6 or mgn-2-r cells, which reached titers of almost 10 3 tcid 50 /ml ( fig. 2a and b , titer after isolation). shot-gun and hybrid-capture-mediated hts of both isolates resulted in the generation of complete genome sequences which are identical in sequence to the respective original strain in bank vole lung tissue except for one amino acid (aa) exchange each in the rna-dependent fig. 3 ). the genome organization of the novel puuv isolates indicated the typical sequence elements for puuv: the small (s) segment encodes an n protein of 433 aa residues and a putative nss protein of 90 aa in an + 1 overlapping reading frame, the medium (m) segment codes for the 1148 aa gpc and the large (l) segment for the rdrp of 2156 aa (see fig. 3 , genbank accession numbers: mn639737-mn639748). phylogenetic analysis of the concatenated s, m and l segment coding sequences grouped the novel isolates together with astrup prototype strain in sister relationship to puuv sequences from france (fig. 4a) . the phylogenetic analysis of a partial s segment sequence of the novel isolates and representative strains of all puuv clades and subclades from germany confirmed the close relationship of the new isolates to the osnabrück hills subclade (fig. 4b) . the puuv osnabrück m43 isolate was found to be contaminated by a bank vole reovirus; hts derived sequences of the passaged reovirus (genbank accession numbers: mn639755-mn639763) showed a strong similarity to a bank vole reovirus strain, but much lower similarity to a common vole reovirus [33] ). the non-reovirus contaminated isolate osnabrück v29 from veroe6 cells was found to have an insertion of 20 nucleotides in the 3′ non-coding region (ncr) when compared to the other isolate and the astrup reference sequence (fig. 3) . however, this insertion was also found in the original lung sample and therefore no cell culture-specific adaptations were observed in the ncrs of both virus isolates (fig. 3 ). figs. 1 and 2 ). this passaging resulted in no further mutations (genbank accession numbers: mn639749-mn639754). however, the virus isolate passaged in veroe6 cells is accompanied by an increase in the virus titer to 10 4 tcid 50 /ml (fig. 2) . in contrast, the passaging of the osnabrück v29 strain in mgn-2-r cells resulted in a decreased virus titer. as no cytopathic effect was observed, virus detection for titration in both cell lines was done by immunofluorescence assay using an n proteinspecific monoclonal antibody (fig. 2a) . eleven monoclonal antibodies were produced in this study by immunization of mice with puuv strain astrup gpcderived vlps. evaluation of the virus isolate osnabrück v29 using these monoclonal antibodies resulted in typical immunofluorescence patterns in the cytoplasm (fig. 5) . further analysis by western blot test using a lysate of isolate osnabrück v29 from veroe6 cells suggested that the majority of anti-gpc antibodies are directed against conformational epitopes; however, some recognize linear epitopes in gc or gn (table 2 ). subsequent evaluation of the reactivity of these monoclonal antibodies with other puuv strains and tulv strain moravia indicated some level of crossreactivity for some of them (table 2) . here, we describe the first isolation of a central european puuv strain. this strain of the central european lineage increases the available panel of puuv isolates: currently available isolates sotkamo, umea, vranica, and kazan, belong to the clades finnish, north scandinavian, most likely north scandinavian, and russian, respectively [34] . the puuv-like hokkaido virus strain kitahiyama128 originates from japan [12] . in our study, the isolation was based on an in-field dissection and inoculation of cells to prevent freeze/thaw cycles. the subsequent investigation of all 57 bank voles indicated that three of four isolates originated from anti-puuv-seronegative voles. this finding illustrates that a serological test in the field might be misleading in selection of samples for successful virus isolation. instead, an on-site molecular assay may enhance the chance for a successful virus isolation. nevertheless, the approach used here still indicates the challenges of hantavirus isolation; only four isolates were obtained from a total of 15 acutely infected bank voles. in addition, the determination of the complete genome sequences of two isolates including the ncrs expands our knowledge on the sequence diversity of puuv strains within the different regions of the genome. moreover, the hybrid-capture-based enrichment of puuv sequences allows a rapid determination of the complete genome and underlines the value of this workflow for hantavirus surveillance and molecular evolution studies [35] . a phylogenetic analysis of partial s segment nucleotide sequences confirmed the previously reported subclades of puuv in germany; the novel isolates belong to the subclade osnabrück hills within the central european clade. the position within the phylogenetic tree also confirms the local evolution pattern of puuv reported before [23, 36] . the observed high level of rt-qpcr-positive bank voles (44/57; 77%) confirms the district of osnabrück in spring 2019 as a hantavirus outbreak region [25] . the puuv rna detection rate was similarly high at all five trapping sites of bank voles. although 2019 was identified as a hantavirus outbreak year in germany, the distribution of notified human puuv cases was not as homogeneous as in previous outbreak years [25] . the passage of the puuv strains for isolation resulted in non-synonymous nucleotide exchanges in the l segment responsible for single amino acid exchanges in the rdrp (i3749m in m43 and d3963y in v29). the substituted amino acid residues are each very similar in their properties and, presumably, might not influence protein function. a more divergent adaptation at position s2053f has previously been observed for puuv strain kazan [8, 37] . although in this previous study nucleotide exchanges in the ncr of the s segment were observed [37] , here we did not find relevant mutations in this region after passaging in cell culture. the v29 strain showed an insertion in the 3′ ncr, but this insert was also found in the original lung material used for isolation. additionally, this sequence insert was found in another sequence from the same region (jn696358.1, [36] ). the isolate v29 was shown to replicate in veroe6 and a bank vole kidney cell line. the low titer in the bank vole mgn-2-r cell line might be due to the evolutionary lineage origin of this cell line (carpathian lineage); in central europe puuv is harbored by the western evolutionary lineage with spillover to the carpathian lineage in regions with sympatric occurrence of both [24] . in line with the assumption of an association of a puuv clade with an evolutionary bank vole lineage, the vranica puuv strain replicated in mgn-2-r cells, but not in bank vole kidney cells of another evolutionary lineage [9, 10] . interestingly, replication of puuv-like hokkaido virus in cells of its host, the gray red-backed vole, was comparable to puuv infection [12] . future investigations in cell lines and animals of different bank vole lineages are required to confirm this conclusion directly. the orthoreovirus contamination of one of the puuv isolates illustrates that bank voles may harbor additional reactivity of novel puuv gpc-specific monoclonal antibodies with hantavirus-infected veroe6 cells in immunofluorescence assay (ifa). antibodies were generated by immunization of balb/c mice with gpc-derived virus-like particles of puuv strain astrup. after screening and subcloning, monoclonal antibodies were tested in ifa. veroe6 cells were infected with puuv osnabrück v29 iso-late on coverslips and fixed for ifa after 10 days. the monoclonal antibodies were administered for 1 h at rt. detection of the specific antibody binding was done using an anti-mouse alexa fluor 488 conjugated antibody. after staining, coverslips were mounted on glass slides for imaging infectious agents that may influence the susceptibility to puuv infections or their outcome. of note, in bank voles several viruses have been detected, i.e., polyoma-, herpesand hepaciviruses [38] [39] [40] [41] , but also bacterial agents and endoparasites [42] [43] [44] . similarly, a hantavirus isolation approach was previously hampered by the coinfection by a striped field mouse adenovirus [45] . future investigations are needed to evaluate potential influences of coinfections in bank voles. it has been shown that hantavirus gn and gc form complex spike-shaped structures [46] that build conformational epitopes [17, 18] . therefore, we selected an immunization procedure using puuv-gpc-derived vlps, as the organization of the glycoproteins resembles the one of the virion. a panel of eleven monoclonal antibodies was produced here and all of them were reactive with the new puuv isolate in immunofluorescence assay. the staining pattern, which is reminiscent of the one of the secretory pathway organelles, i.e., the golgi apparatus and the endoplasmic reticulum, suggests that the epitopes recognized by these antibodies are already accessible during the maturation process of the proteins. interestingly, some of the monoclonal antibodies recognize linear epitopes as revealed by a western blot assay. although preliminary results suggest that the antibodies do not neutralize the virus when tested individually, synergistic effects with a protective effect cannot be ruled out yet as shown for anti-ebola virus monoclonal antibodies [47] . therefore, the novel antibodies represent a useful tool for further experimental, diagnostic, and therapeutic applications. in conclusion, the puuv isolate described here replicates in a bank vole cell line and its n and gpc proteins can be detected by specific monoclonal antibodies. therefore, this isolate will be useful for further studies on the virulence markers of central european puuv, its reservoir host association and the route of pathogenicity in the bank vole model. the novel gpc-specific monoclonal antibodies will enable future studies on virus entry and important domains for exposed immunogenic regions. funding florian binder acknowledges intramural funding by the friedrich-loeffler-institut. additional funding was provided by the bundesminsterium für bildung und forschung through the research network zoonotic infections (robopub consortium, fkz 01ki1721a, awarded to rgu; fkz 01ki1721h, awarded to laves) for trapping and rodent screening, the rapid project within the infect control veroe6 cells were inoculated with puumala virus (puuv) osnabrück/v29, puuv sotkamo, puuv vranica or tula virus (tulv) strain moravia. infected cells were fixed 10 (puuv osnabrück/v29, sotkamo) or 3 (puuv vranica, tulv moravia) days post infection for immunofluorescence assays or collected in sample buffer for western blot analysis. after fixation or western blot transfer, novel gpc-specific mabs 2e10, 5f12, 3b12, 5b8, 5h1, 4g10, 1b12, 1g9, 8g4, 1h7, and 2h11 were administered. gn-and gcreactive mabs were assigned where possible according to molecular weight of the immunoreactive bands in western blot analysis − negative; (+) weak reactivity; + positive; ++ strongly positive conflict of interest the authors declare that they have no competing interests. ethical approval all animals were handled according to the applicable institutional, national and international guidelines for the care and use of animals. bank vole trapping was conducted in line with the regular pest control of the laves veterinary task-force in lower saxony, germany (department of pest control, oldenburg) according to german federal law ( § 18, gesetz zur verhütung und bekämpfung von infektionskrankheiten beim menschen). the immunization of mice was done in line with the general immunization program of the friedrich-loeffler-institut (landesamt für landwirtschaft, lebensmittelsicherheit und fischerei, mecklenburg-vorpommern, permit: 28/17). open access this article is licensed under a creative commons attribution 4.0 international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. hantavirus infections hantaviruses-globally emerging pathogens weiss s (2019) molecular and epidemiological characteristics of human puumala and dobrava-belgrade hantavirus infections coding strategy of the s and m genomic segments of a hantavirus representing a new subtype of the puumala serotype isolation of the causative agent of hantavirus pulmonary syndrome propagation of nephropathia epidemica virus in cell culture isolation and characterization of puumala hantavirus from norway: evidence for a distinct phylogenetic sublineage cell culture adaptation of puumala hantavirus changes the infectivity for its natural reservoir, clethrionomys glareolus, and leads to accumulation of mutants with altered genomic rna s segment a new permanent cell line derived from the bank vole (myodes glareolus) as cell culture model for zoonotic viruses common vole (microtus arvalis) and bank vole (myodes glareolus) derived permanent cell lines differ in their susceptibility and replication kinetics of animal and zoonotic viruses more novel hantaviruses and diversifying reservoir hosts-time for development of reservoirderived cell culture models? viruses isolation of hokkaido virus, genus hantavirus, using a newly established cell line derived from the kidney of the grey red-backed vole (myodes rufocanus bedfordiae) characterization of monoclonal antibodies against hantavirus nucleocapsid protein and their use for immunohistochemistry on rodent and human samples sensitive detection of hantaviruses by biotin-streptavidin enhanced immunoassays based on bank vole monoclonal antibodies novel serological tools for detection of thottapalayam virus, a soricomorpha-borne hantavirus bank vole monoclonal antibodies against puumala virus envelope glycoproteins: identification of epitopes involved in neutralization the use of chimeric virus-like particles harbouring a segment of hantavirus gc glycoprotein to generate a broadly-reactive hantavirusspecific monoclonal antibody human recombinant neutralizing antibodies against hantaan virus g2 protein hantavirus gn and gc envelope glycoproteins: key structural units for virus cell entry and virus assembly virus-like particles: a versatile tool for basic and applied research on emerging and reemerging viruses. viral nanotechnologies complete genome of a puumala virus strain from central europe protocadherin-1 is essential for cell entry by new world hantaviruses spatiotemporal dynamics of puumala hantavirus associated with its rodent host myodes glareolus host-associated absence of human puumala virus infections in northern and eastern germany heterogeneous puumala orthohantavirus situation in endemic regions in germany in summer aphaea/ewda species card: voles and mouses a versatile sample processing workflow for metagenomic pathogen detection a restful api for access to phylogenetic tools via the cipres science gateway beitrag zur kollektiven behandlung pharmakologischer reihenversuche antigenic and cellular localisation analysis of the severe acute respiratory syndrome coronavirus nucleocapsid protein using monoclonal antibodies indirect elisa based on hendra and nipah virus proteins for the detection of henipavirus specific antibodies in pigs phylogenetic analysis of puumala virus subtype bavaria, characterization and diagnostic use of its recombinant nucleocapsid protein isolation and complete genome characterization of novel reassortant orthoreovirus from common vole (microtus arvalis) phylogeography of puumala orthohantavirus in europe secondary contact between diverged host lineages entails ecological speciation in a european hantavirus multiple synchronous outbreaks of puumala virus adaptation of puumala hantavirus to cell culture is associated with point mutations in the coding region of the l segment and in the noncoding regions of the s segment evidence for novel hepaciviruses in rodents identification of two novel members of the tentative genus wukipolyomavirus in wild rodents identification of novel rodent herpesviruses, including the first gammaherpesvirus of mus musculus molecular detection and characterization of the first cowpox virus isolate derived from a bank vole leptospira genomospecies and sequence type prevalence in small mammal populations in germany high prevalence of rickettsia helvetica in wild small mammal populations in germany. ticks and tick-borne diseases occurrence of gastrointestinal parasites in small mammals from germany. vector borne zoonotic dis a novel cardiotropic murine adenovirus representing a distinct species of mastadenoviruses molecular organization and dynamics of the fusion protein gc at the hantavirus surface cooperativity enables non-neutralizing antibodies to neutralize ebolavirus acknowledgements open access funding provided byprojekt deal. the authors would like to thank sönke röhrs for help with rodent trapping, stephan drewes for help with phylogenetic analysis and sven sander and patrick zitzow for excellent technical support with generation of monoclonal antibodies and sequencing of puuv isolates. the authors thank martin beer, klaus osterrieder and nicole tischler for constant support and helpful discussions.author contributions rgu and fb designed the study and wrote the manuscript. fb did virus isolation, infection studies, sequence analysis, phylogenetic analysis, and testing of monoclonal antibodies. sr and fb generated and screened the monoclonal antibodies. grs produced the vlps for immunization. ms and fb performed rodent trapping. dh, jt, and dk did the complete genome sequencing of puuv isolates. rr developed the puuv-specific rt-qpcr assay. all authors gave significant ideas for the presented work and were involved in writing and proof reading of the manuscript. key: cord-296791-h8ftslps authors: junwei, ge; baoxian, li; lijie, tang; yijing, li title: cloning and sequence analysis of the n gene of porcine epidemic diarrhea virus ljb/03 date: 2006-10-01 journal: virus genes doi: 10.1007/s11262-005-0059-z sha: doc_id: 296791 cord_uid: h8ftslps the nucleocapsid (n) gene of the porcine epidemic diarrhea virus (pedv) strain ljb/03 which was previously isolated in heilongjiang province, china, was cloned, sequenced and compared with published sequences of other avian and mammalian coronavirus. the nucleotide sequence encoding the entire n gene open reading frame (orf) of ljb/03 was 1326 bases long and encoded a protein of 441 amino acids with predicted mr of 49 kda. it consisted of 405 adenines (30.5%), 294 cytosines (22.1%), 329 guanines (24.8%) and 298 thymines (22.5%) residues. sequence comparison with other pedv strains selected from genbank revealed that the ljb/03 n gene has a high sequence homology to those of other pedv isolates, 97.4% with js2004, 95.6% with chinju99, 96.6% with br1/87, and 96.8% with cv777. the encoded protein shared 96.4% amino acid identities compared with cv777, 96.1% with brl/87, 98% with js2004, 96.90% with chinju99, respectively. the amino acid sequence contained seven potential protein kinase c phosphorylation sites, nine casein kinase ii phosphorylation sites, one tyrosine kinase phosphorylation site, two campand cgmp-dependent protein kinase phosphorylation sites. porcine epidemic diarrhea virus (pedv) is classified as a member of the coronaviridae and causes acute enteritis in pigs [1] , which was first reported in england in 1971 and has been reported in many countries such as germany, canada, japanese, korea, france, belgium, switzerland, etc. [2] [3] [4] [5] . the disease was first reported in china in 1976 [6] , caused serious economic losses due to the death of neonatal piglets and weight loss of the infected pigs. clinical signs of ped include anorexia, vomiting, diarrhea, and dehydration. morbidity and mortality in infected neonatal piglets less than 5 days old approach 100% because of severe diarrhea and dehydration. however, mortality in infected piglets older than 10 days is less than 10% [1] . the coronavirus genome consists of a positive-sense, single-stranded rna molecule that is 20-30 kb in size [7] . virions are enveloped, pleomorphic, and 80-220 nm in diameter, and they have club-shaped peplomers approximately 20 nm in length. coronavirus possesses four major structural proteins including a phosphorylated nucleocapsid (n) protein and three envelope proteins, membrane protein (m), spike protein (s), and envelope protein (e); the first two envelope proteins are major envelope proteins, while the amount of e protein in virion is low [8] [9] [10] , s glycoprotein makes up the large surface projections of the virion and the m and e proteins are essential for viral envelope formation and release [11, 12] . studies indicate that the n proteins of coronaviruses are extensively phosphorylated, highly basic, and binds, to the viral genomic rna forming a helical ribonucleoprotein (rnp) [13] . a variety of functional activities have been ascribed to the n proteins of previously known coronaviruses, including participation in transcription of the viral genome, the formation of viral core, and packaging viral rna [14] . the n protein is highly immunogenic, more then, the cellular immune response against n protein of some animal coronaviruses can enhance the recovery from the virus infection [15, 16] . n protein can accumulates intra-cellularly even before it is packed in the mature virus [17] and is the most abundant virus derived-protein throughout the infection, probably because its template mrna is the most abundant subgenomic rna [18] . these features make it a suitable candidate for the accurate and early diagnosis and develop genetically engineered vaccines [19] . the aim of present study was to determine the complement nucleotide sequence of the pedv n gene and get more information about pedv isolates comes from different region. in this study, the rna of pedv was extracted directly from the feces samples of piglets naturally infected with pedv ljb/03. the n gene has been cloned, sequenced and compared with other pedv strains. these data are useful for further the study of molecular biology of pedv strains that are prevalent in china. virus strain pedv ljb/03 was collected from the feces of piglets suffering from severe diarrhea in heilongjiang, china. the feces sample was operated following the methods of fan jinghui and li yijing [20] . the feces sample was diluted 1-10 in a disruption buffer (500 mm tris-hci [ph 8.3], 2% (w/v) pvp-40, 1% (w/v) peg6000, 140 mm nacl, 0.05% (v/v) tween 20), vortexed, incubated at room temperature for 10 min, and centrifuged using a beckman f3602 rotor at 2000 · g at 4°c for 5 min. the supernatant was removed and used for the extraction of the viral rna using the trizol reagent (invitrogen usa) according to the manufacturer's protocol and dissolved in diethyl procarbonatetreated distilled water. a pair of sense and antisense primer was designed and aligned based on nucleotide sequences of the n gene of cv777 and brl/87 available in genbank. the sense primer 5¢-ttatggcttctgtcagcttt-3¢ and antisense primer 5¢-acattgtttaatttcctgtatc-3¢ were used to amplify the n gene coding for the n protein of pedv strain ljb/03. synthesis of the first-strand cdna for n gene was carried out by reverse transcription using promega reverse transcription reagent. the viral rna (50 ll) was mixed with 2.5 ll of 10 pm of the antisense primer, incubated at 65°c for 5 min, and then placed on ice for 2 min. after that, 4 ll of 5· rt buffer, 4 ll of 2.5 mm dntp mixture, 1 ll of rnase inhibitor (40 u/ll), 1 ll of reverse transcriptase (200 u/ll), 2.5 ll h 2 o was added and mixed gently. the reaction mixture was incubated for 50 min at 42°c, and was terminated by heating for 10 min at 65°c. rnase h (1 ll) was added to degrade rna template for 20 min at 37°c prior to pcr amplification. pcr was carried out in a 50 ll volume by mixing the cdna above with 2.5 ll of each 10 pm sense and antisense, 1 mm each of datp, dgtp, dttp, dctp, 5 ll of 10· pcr buffer (100 mm tris-hcl, 1.5 mm mgc1 2 , 50 mm kc1, ph 8.3), and 2.5 u taq dna polymerase (takara biotechnology (dalian) co. ltd.). cycles were as follows: 94°c for 15 s, followed by 30 cycles of 94°c for 40 s, 49°c annealing for 30 s, 72°c extension for 1 min and a final extension of 72°c for 5 min. the pcr product was analyzed by electrophoresis through an agarose gel (fig. 1) , and visualized by staining with ethidium bromide, the target cdna band was extracted from the gel using the qiagen ò gel extraction kit according to the manufacturer's instructions. the purified pcr products were cloned into the pgem-t easy vector (promega, madison, usa) with t4 dna ligase. the plasmids were transformed into e. coli dh5a using standard molecular technique. plasmid dna was extracted by alkaline-lysis from e. coli dh5a culture and verified by using restriction enzyme digestion, pcr and electrophoresis in 1% agarose (fig. 2) . colonies with correct sizes was named pgem-t-n and at least three independent plasmid clones were analyzed, confirmed and sequenced. the nucleotide sequence of the n gene of ljb/03 was, determined by takara biotechnology (dalian) co. ltd. amino acid sequences were aligned using the clus-tal w method, and phylogenetic trees were constructed using the neighbor-joining method. analyses were done using the megalign application of the lasergene software package. the identification of sequence motifs was done with the psi-blast program using the swiss-prot database through the myhits web server (http://myhits.isb-sib.ch). by using rt-pcr method, we successfully amplified the nucleocapsid gene. the pcr products were approximately 1.3 kb in size and cloned into the pgem-t easy vector. the complete nucleotide sequence of nucleocapsid gene has been deposited in genbank, accession number is dq072726. sequence analysis indicate that the compete open reading frame (orf) for the nucleocapsid gene of pedv ljb/03 consists of 1326 bases and codes for a basic protein of 441 amino acid. it consisted of 405 adenines (30.5%), 294 cytosines (22.2%), 329 guanines (24.8%) and 298 thymines (22.5%) and a g+c content of 47.0%. the result of motif blast indicated the ljb/03 n protein had seven potential protein kinase c phosphorylation sites, nine casein kinase ii phosphorylation sites, one tyrosine kinase phosphorylation site, two camp-and cgmp-dependent protein kinase phosphorylation sites. the gene had 43 nucleotide mismatches compared to cv777, a substantial portion (72%, 31/43) of the substitutions was transversions, about 60% of the substitutions were non-synonymous mutations. table 1 shows that the percent similarity of the n nucleotide sequences varied from 95.6% to 97.4% between ljb/03 and the other four strains of pedv, and a high degree of identity (94.9-99.8%) was observed between the nucleotide sequences of pedv strains. the alignment of the nucleotide sequences shows that no deletion or insertion event was detected, and there is a large region of absolute identity such as in the region from nucleotide 517 to nucleotide 614 (517-614 bases). the entire nucleocapsid protein of pedv ljb/03 aligned with the published sequences of cv777, brl/87, chinju99 and js2004. this alignment indicates that overall the sequences are, highly conserved with some regions showing no variation at all, and the 15 nucleotide acid substitutions in the 5¢ region (1-249 bases) did not arouse amino acid changes, which may suggestion the n-terminal of the protein had more homologous than the c-terminal. two-way comparisons among the nucleocapsid proteins of these five strains of pedv indicate that the identities range from 95.9% to 98.0%, with cv777 and brl/87 having the most identity, and ljb/03 and chinju99 the least. a phylogenetic tree was prepared to further examine relationships between pedv and other coronaviruses based on a comparison of n protein amino acid sequences (fig. 3) . phylogenetic analysis showed that pedv was more closely related to group 1 (tgev, hcv 229e and fipv) than to members of group 2 korea are more closely related to each other than they are to those two isolates european cv777 and brl/87. in the present study, the n gene of ljb/03 was cloned and sequenced. the result sequence revealed the n gene has a orf of 1326 nucleotides coding for a 441 amino acids protein. sequence comparison with other pedv strains selected from genbank indicated that the n gene of pedv was highly conserved even though comes from different geographic region, and the alignment result showed there is some region of absolute identity in the sequences. previous studies showed the chinju99 n protein had 7 potential t-or s-linked phosphorylation sites and seven potential casein kinase ii phosphorylation sites, the result in this study indicated the ljb/03 n protein had seven potential protein kinase c phosphorylation sites, nine casein kinase ii phosphorylation sites, one tyrosine kinase phosphorylation site, two camp-and cgmp-dependent protein kinase phosphorylation sites. the entire nucleocapsid protein of pedv ljb/03 aligned with the published sequences of cv777, brl/87, chinju99 and js2004. this alignment of nucleocapsid protein sequences indicates that overall the sequences are highly conserved with some regions showing no variation at all. this can be the feasible information for the development of genetically engineered n protein for vaccine to prevent pedv infections. shuichi et al. developed a method of detection of pedv using polymerase chain reaction based on part of nucleocapsid nucleotide, and then compare the nucleocapsid nucleotide among strains of the virus, the result of restriction analysis the pcr products were that cv777 and all the korean strain can be digested with dra i, ecor i, but the korean strain was not digested with pst i. we found the n gene of ljb/03 and js2004 (another china isolate) have the same restriction patterns with the korean strains [21] . coronaviruses have been subdivided into three major antigenic groups based on antigenic differences identified by serological analyses and nucleotide sequence analyses [22, 23] . group i members are the porcine transmissible gastroenteritis virus (tgev) and epidemic diarrhea virus (pedv), feline and caamino acid sequences were aligned using the clustal method, and phylogenetic . trees were constructed using the neighbor-joining method. analyses were done using the megalign application of the lasergene software genbank accession numbers of sequences in the phylogenetic tree are: ljb/03 dq072726; chinju99 af237764; cv777 nc003436; brl/ 87 z24733 (britain isolate); js2004 ay653206 (china field isolate) nine coronavirus (fcov and ccov), and human coronavirus 229e (hcov-229e). group ii includes porcine hemagglutinating encephalomyelitis virus (hev), murine hepatitis virus (mhv), bovine, equine, and rat coronavirus (bcov, ecov, and rtcov), and human coronavirus oc43 (hcov-oc43). group iii is specific for avian species including turkey coronavirus (tcov), pheasant coronavirus and avian infectious bronchitis virus (ibv). the coronavirus n protein has been shown to be highly variable in size as well as in amino acid composition between the viruses that comprise the three coronavirus antigenic groups but highly conserved within these groups. group i viral genomes have the smallest nucleocapsid protein with 378-389 residues, group ii genomes have the largest with 449-455 residues and group iii 409 residues. all 5 pedv strains had 441 amino acid residues, and have a longer peptide than other group i members, which illuminate pedv, a particular case is an exception to the rule of the coronavirus n protein has been shown to be highly conserved within these groups. in the study, we acquired the nucleotide sequence of the n gene pedv ljb/03 and did the nucleotide sequence analysis to establish the phylogenetic relationships between several strains of pedv. this work showed that the nucleotide sequence can form a base for further study on the epidemiological study of pedv infections. diease of swine pig farming acknowledgement the financial support of this work was provided by grants from ''project of the tenth-five'' of heilongjiang provincial scientific and technique committee, china. key: cord-005253-8qja4j9h authors: li, weike; li, tiansong; liu, yuxiu; gao, yuwei; yang, songtao; feng, na; sun, heting; wang, shengle; wang, lei; bu, zhigao; xia, xianzhu title: genetic characterization of an isolate of canine distemper virus from a tibetan mastiff in china date: 2014-04-02 journal: virus genes doi: 10.1007/s11262-014-1062-z sha: doc_id: 5253 cord_uid: 8qja4j9h canine distemper (cd) is a highly contagious, often fatal, multisystemic, and incurable disease in dogs and other carnivores, which is caused by canine distemper virus (cdv). although vaccines have been used as the principal means of controlling the disease, cd has been reported in vaccinated animals. the hemoagglutinin (h) protein is one of the most important antigens for inducing protective immunity against cd, and antigenic variation of recent cdv strains may explain vaccination failure. in this study, a new cdv isolate (tm-cc) was obtained from a tibetan mastiff that died of distemper, and its genome was characterized. phylogenetic analysis of the h gene revealed that the cdv-tm-cc strain is unique among 20 other cdv strains and can be classified into the asia-1 group with the chinese strains, hebei and hlj1-06, and the japanese strain, cyn07-hv. the h gene of cdv-tm-cc shows low identity (90.4 % nt and 88.9 % aa) with the h gene of the classical onderstepoort vaccine strain, which may explain the inability of the tibetan mastiff to mount a protective immune response. we also performed a comprehensive phylogenetic analysis of the n, p, and f protein sequences, as well as potential n-glycosylation sites and cysteine residues. this analysis shows that an n-glycosylation site at aa 108-110 within the f protein of cdv-tm-cc is specific for the wild-type strains (5804p, a75/17, and 164071) and the asia-1 group strains, and may be another important factor for the poor immune response. these results provide important information for the design of cd vaccines in the china region and elsewhere. analysis of cdv strains from various animal samples has demonstrated an important relationship with the h gene/ glycoprotein, which has changed by genetic/antigenic drift. as the key protein for cdv, h is used for attachment to cell receptors as the first step of infection and mediates adequate host immune response [9] . the h protein is considered to have the highest antigenic variation and can reflect genetic changes in comparative studies of cdv strains [10] [11] [12] [13] . this variation may affect neutralization-related sites with disruption of important epitopes. analysis of cdv strains from different animal species and geographical settings has revealed that the geographic pattern is an important factor in the genetic/antigenic drift affecting the h gene/glycoprotein of cdv [14] [15] [16] [17] [18] [19] [20] . therefore, the h gene may be used for identification and phylogenetic classification of cdv strains, which have been identified into seven major genetic lineages, namely america-1 and -2, asia-1 and -2, arcticlike, europe, and wild-life [21, 22] , as well as an indication of the antigenic response of the virus. three other proteins, the nucleocapsid (n) protein, the phosphoprotein (p) protein, and the fusion (f) protein, also have important roles for cdv and could provide additional sources of antigenic variability among strains. the n protein has immunosuppressive properties and is the major component of the cdv virion. the n-terminal domain of the n protein is generally well conserved, while the c-terminal end is poorly conserved and is considered hypervariable. the c-terminal tail of the n protein also contains the majority of its phosphorylation sites and antigenic sites [23, 24] . during active infection, antibodies made against the n protein in the host are predominant and account for most of the complement-fixing antibody [25, 26] . the p protein is relatively well conserved and plays a vital role in transcription and replication [27] . this protein is an essential component of the viral rna phosphoprotein complex (vrnap) [28] and also function as a chaperone for the n protein. the f protein is a type i integral membrane protein that mediates viral penetration by fusion between the virion envelope and the host cell plasma membrane at neutral ph. it is synthesized as an inactive precursor, f0, and must be proteolytically cleaved to produce the functionally active fusion protein, which consists of disulfide-linked f1 and f2 polypeptides [29] . like the h protein, the f protein has high antigenic variation. in this study, the wild-type cdv-tm-cc strain was isolated from the spleen of a 1-year-old tibetan mastiff that developed clinical signs of cd after having received all standard vaccines. to determine whether this occurrence may be explained by variations in specific nucleotide or amino acid residues of the cdv circulating in china, we sought to genetically characterize the cdv-tm-cc strain. verodogslam cells constitutively expressing the cdv receptor dog signaling lymphocyte activation molecule (slam) were cultured in dulbecco's modified eagle medium (dmem; gibco) supplemented with 10 % heatinactivated fetal bovine serum (fbs) with an additional 8 lg of g418 per ml. the wild-type cdv-tm-cc strain was originally isolated from spleen homogenate (10 % w/v suspension) from a tibetan mastiff that succumbed to naturally infection. virus was propagated in verodogslam cells and stored at -80°c. total rna was prepared from verodogslam cells infected with cdv-tm-cc according to the manufacturer's instructions (total rna kit i, omega). the reverse transcription reactions were performed using m-mlv reverse transcriptase (invitrogen) with oligo d(t) and random primers. according to the complete consensus genomic sequence of cdv (genbank), two sets of primers were designed to amplify the entire genome (oligo6.0 design software), as shown in table 1 . sequences were assembled and compared using dna sequence analysis software (dnastar), and the complete consensus genomic sequence was determined. pcr amplification was carried out using phusion high-fidelity dna polymerase (new england biolabs). clones (amplicons emcompassing the full-length cdv-tm-cc genome) were obtained a genbank number is provided for each of the strains of cdv that were compared with cdv-tm-cc in this study. the geographical location of strain isolation and the species/organ of isolation are also indicated, as well as the clade into which the strains are categorized virus genes (2014) 49: 45-57 47 from thirty rt-pcr reactions using cdv-specific oligonucleotides. to genetically characterize the cdv-tm-cc strain, the deduced amino acid sequence was compared to f and h gene fragments of the variant field isolates shown in table 2 . a phylogenetic tree was constructed based on the deduced amino acid sequences in supplementary table 1 using mega 5.0, and multiple sequence alignment was carried out using clustalw. statistical significance of the phylogeny was estimated by bootstrap analysis over a 1,000 pseudoreplicate data set. the wild-type cdv-tm-cc strain was isolated from the spleen of a 1-year-old tibetan mastiff in jilin province that had succumbed to cd after having received all standard vaccines (6 weeks first immunization, 8 weeks second immunization, 10 weeks third immunization with distemper, adenovirus type 2, parvovirus, parainfluenza quadruple vaccine; canine coronavirus disease killed virus vaccine portion, usa). the virus was propagated in verodogslam cells and the virulence of the strain was confirmed (data not shown). to identify sequence features that may explain the failure of the vaccine strain to protect the dog against cd, we sequenced the entire genome, using two sets of overlapping primers (table 1) . within the cdv genome, the h gene is a major causative disease determinant and also has one of the highest rates of mutation. consequently, the phylogenetic relationship of cdv strains is often based on the deduced amino acid sequence of the h protein. the h gene of the cvt-tm-cc strain has 1,824 nucleotides and the inferred protein sequence has 607 amino acids, similar to the other cdv strains. amino acid analysis of the h protein from cdv-tm-cc and 20 other cdv strains in genbank (table 2) identified seven clades of cdv strains (america-1, america-2, asia-1, asia-2, europe, arctic-like, and europe wildlife). cdv-tm-cc was classified into the asia-1 group with the strains cyn07-hv (japan), hebei (china), and hlj-06 (china) (fig. 1 (fig. 2a, b) . glycosylation is an important factor in determining the antigenicity of many proteins [30] . prediction of the glycosylation sites of the h gene (http://www.cbs.dtu.dk/ser vices/netnglyc/, netnglyc 1.0 servera) identified a total of eight potential glycosylation sites at positions 19 notably, the 309-311 n-glycosylation site is specific for virulent strains [14, 18] with the exception of a75/17. the 584 n-glycosylation site is specific for the asian-1 strains, suggesting that it was acquired later [18, 20] . cdv-tm-cc has both of these predicted glycosylation sites, which could explain its virulence properties. phylogenetic analyses of the amino acid sequence of the n and p proteins to determine whether the conservation of cdv-tm-cc also extends to other proteins within the virus, we assessed the similarity of the n and p proteins. consistent with the results for the h protein, the homology of the deduced cdv-tm-cc amino acid sequence of the n protein to the asia-1 strains (cyn07-hv, hlj1-06, and hebei) was high with 98.7-98.9 % identity, as shown in fig. 4 . the n protein sequence of cdv-tm-cc also showed 98.1 % identity with the asia-2 group (strains m25cr, 007lm, 011c, 50con, and 55l), and 97.5 % identify with the onderstepoort strain. moreover, cdv-tm-cc had high similarity (98.5, 98.7, and 97.9 % identity) with wild-type strains 164071, a75/17, and 5804p. the lowest homology of the cdv-tm-cc n protein sequence (96.6-96.8 % aa identity) was found with arctic-like strains cdv3, shuskiy, and phoca-caspian-2007. this relatively high similarity between the n protein of cdv-tm-cc and other cdv strains is consistent with the generally high conservation among n proteins. the phylogenetic relationship of cdv-tm-cc based on the deduced amino acid sequence of the p protein was also analyzed (fig. 5 ). similar to the results for the h protein, cdv-tm-cc classified into the asia-1 group, but was in a separate branch from the classical onderstepoort vaccine strains. these results verify the classification of cdv-tm-cc as an asia-1 group virus. the signal peptide is a short amino acid sequence at the n-terminus of the majority of newly synthesized proteins that are destined towards the secretory pathway and is a highly divergent region [31] . analysis of the 1-135 aa signal peptide region of the f protein of cdv-tm-cc demonstrated the same set of amino acid variations in comparison with the onderstepoort strain as for the other asia-1 strains (cyn07-hv, hlj1-06, and hebei): 8 s/ 8 k, 11 t/ 11 p, 19 (fig. 6) . among the asia-2 strains (m25cr, 007lm, 011c, 50con, and 55l), variations in comparison with the onderstepoort strain were found in 30 t/ 30 s, 53 s/ 53 a, 55 r/ 55 w, 59 s/ 59 y, 62 n/ 62 k, 99 r/ 99 k, 110 i/ 110 v, and 111 n/ 111 k. additionally, both the asia-1 and asia-2 strains had clade-specific amino acid variation in 21 p/ 21 q. moreover, the cdv-tm-cc strain had characteristic additional variations in 107 p/ 107 y and 116 c/ 116 y. therefore, the signal peptide region of cdv-tm-cc has both asia group-specific and individual variations. among the cdv strains, amino acid variation was also found in 208 k/ 208 n in the f2 region (aa 136-224) for the asia-1 group. generally, there was high conservation within the hydrophobic fusion peptide (fp) domain at the n-terminus of the membrane anchored f1 subunit, with the exception of 233 a/ 233 v in the 98-2654 and 98-2646 strains (fig. 6) . amino acid variations between the asia-1 and asia-2 groups were also found in a region between the helical bundles (hb) and heptad repeats b (hrb) at 394 v/ 394 s, 429 r/ 429 k, and 466 l/ 466 i; within the trans-membrane (tm) domain at 627 c/ 627 y, 634 q/ 634 r, and 637 h/ 637 f; and within the cytoplasmic tail (ct) domain, at 656 r/ 656 k. among the asia group strains, the hra (aa 250-307) and hb (aa 328-374) domains were highly conserved, with the exception of a 280 q/ 280 a variation in the hra domain. likewise, the amino acids were highly conserved in the hrb (aa 557-601) domain in all cdv strains except for hebei ( 583 d/ 583 n) and 5804p ( 587 v/ 587 i). common amino acid changes in other regions of cdv strains in comparison to the onderstepoort strain were found at 317 k/ 317 r and 556 s/ 556 g. the potential n-glycosylation sites (n-x-s/t) of the f protein were highly conserved at 141 nls, 173 nvs, 179 nct, and 517 nqs in the f1 region among all cdv strains as reported previously [32] [33] [34] (fig. 6) . moreover, the asia-1 group (strains cyn07-hv, hlj1-06 hebei, and cdv-tm-cc) had specific potential n-glycosylation sites at 62 nrt and 108 nat in the signal peptide region, with the exception of the cdv-tm-cc strain, which had the sequence 62 nkt. five of these six potential glycosylation sites of the cdv-tm-cc strain were at the same positions within the known virulent cdv strains (a75/17, 5804p and 164071) at aa 108-110, 141-143, 173-175, 179-181, and 517-519, whereas 62 nkt was unique for cdv-tm-cc, and 62 nrt and 38 nit were unique for 5804p. cysteine is an a-amino acid that plays an important role in intramolecular disulfide bond formation and the steric structure of proteins. in the f protein of cdv-tm-cc, a total of 16 cysteine residues were detected. among them, 14 residues (aa 123, 132, 180, 307, 446, 455, 470, 478, 502 , 507, 509, 531, 628, and 629) were located at identical positions in all cdv strains; however, several amino acid(s) were characteristic to individual strain(s), such as 67 r/ 67 c in the america group (strains onderstepoort, 98-2654, 98-2646, snyder hill, cdv3, shuskiy and phoca-caspian-2007) and 116 y/ 116 c in cdv-tm-cc. the presence of amino acid variations, as well as specific n-glycosylation sites and cysteine residues within the f protein, could affect the immune response to cdv-tm-cc. improved vaccination has reduced the frequency and magnitude of cd [35] . distemper vaccination failures are uncommon, but outbreaks of cd continue to occur among vaccinated individuals and populations [4, 5, 36, 37] . the most common factor in cd occurrence is a lack of the h protein, a major structural protein of cdv, mediates host selection and pathogenicity, and the rate of genetic variation for its gene is greater than for other genes. with geographically distinct lineages, many studies have demonstrated that phylogenetic analysis can be carried out in accordance with the deduced amino acid sequences of the h protein [14, 18, 21, 38] . in this study, phylogenetic analysis based on the h protein identified seven clades of cdv strains (america-1, america-2, asia-1, asia-2, europe, arctic-like, and europe wild-life), and cdv-tm-cc was classified into the asia-1 group, with the highest identity to the chinese strains, hlj1-6 and hebei, and the japanese strain, cyn07-hv. potential n-glycosylation sites may differ for the h protein of the wild-type and vaccine strains of cdv. usually, only 4-7 potential sites are found within vaccine strains (such as onderstepoort), in comparison with 8-9 sites in wild-type cdv strains (for example, 5804p). in particular, the 309-311 n-glycosylation site, which is specific for the wild-type strain [14, 18] , is suggestive of the pathogenicity of the cdv-tm-cc strain. furthermore, the 584-586 n-glycosylation site has been acquired in the asian-1 strains [18, 20] . further study may determine whether these differences in glycosylation the n protein is a highly conserved immunogenic protein that can elicit cellular and humoral immunity [39] . based on sequence differences between the gene of the wild strains and vaccine strain, the n protein may affect the seroprotection rate of the host and lead to immune failure. like the h protein, the n protein of cdv-tm-cc showed the highest homology with the asia-1 group. high homology was also observed with the asia-2 group (strains m25cr, 007lm, 011c, 50con, and 55l) and wild-type strains (164071, a75/17, and 5804p). moreover, the lowest homology was found between cdv-tm-cc and the onderstepoort strain. variation in the immunodominant epitope of the virus may change the structure, and therefore, we can speculate that the t cell-mediated immune response may be altered by variations in this protein. the p gene is extremely well conserved and, therefore, is particularly important in the phylogenetic classification. based on the phylogenetic relationship of the deduced amino acid sequence of the p protein, cdv-tm-cc was also classified into the asia-1 group. these results highlight the importance of considering the geographical setting to control the occurrence of the disease in a more efficient manner. the f protein is a surface glycoprotein that mediates viral entry into the host cell by fusion of the virion envelope and the host cell plasma membrane at a neutral ph. within the f protein, the signal peptide region (aa 1-135) has the lowest amino acid homology, especially at positions 13-37 and 72-112. however, our analysis shows that the signal peptide region is relatively well conserved among the asia-1 group, except for specific individual amino acids, indicating that the signal peptide of the f protein is geographically distinct. in addition, three amino acids specific to the cdv-tm-cc strain ( 62 k, 107 y, and 116 y) are located in the signal peptide region. the previous study reported that the amino acids 208 k and 216 l are specific for the cdv vaccine strains; however, we also found 208 k in the wild-type strains in the america group (a75/17, 164071, and 5804p) and asia-2 group (011c, m25cr, 55l,50con, and 007lm). the f protein of the cdv-tm-cc strain has six potential glycosylation sites. among them, differences were found to reside mainly in the signal peptide region, but no clear rule could obviously explain the differences in the wide-type and vaccine strains or the geographical variation, including the occurrence of a strain-specific site (62-64 nkt) for cdv-tm-cc. four additional potential glycosylation sites were recognized at positions 141-143, 173-175, 179-181 in the f2 region and 517-519 in the f1 region, as reported previously [32] [33] [34] . the 108-110 n-glycosylation site is specific for the wildtype strains (5804p, a75/17, and 164071) and the asia-1 group (hebei, hlj1-06, and cyn07-hv), and may be another important factor in vaccination failure. the fusion peptide (fp) domain also was found to be highly conserved among all cdv strains, except for 233 a/ 233 v in 98-2654 and 98-2646. in short, the genetic/antigenic drift observed in the currently circulating cdv strains should be considered as a possible factor leading to the resurgence of cd cases. analysis of cdv strains detected globally and from a variety of host species will provide a more in-depth understanding of the global ecology of cdv and will provide the basis for the improvement of current cdv vaccines. the wild-type cdv-tm-cc strain, originally isolated from spleen homogenate from a fully vaccinated tibetan mastiff in china, was classified into the asia-1 group cluster of cdv strains based on the sequence of its h protein and verified by the sequence of its p protein. variations in specific amino acid residues, n-glycosylation sites, and cysteine residues throughout the cdv-tm-cc genome may explain the failure of the dog to mount vaccine-mediated protection against cd. these results provide the foundations for the global improvement in current cdv vaccines. virus infections of carnivores role of glycosylation of notch in development acknowledgments this work was supported by ecology of zoonoses and research of infection and immunity mechanisms (2012cb722501). key: cord-310298-26x2p9wc authors: tao, pan; dai, li; luo, mengcheng; tang, fangqiang; tien, po; pan, zishu title: analysis of synonymous codon usage in classical swine fever virus date: 2008-10-29 journal: virus genes doi: 10.1007/s11262-008-0296-z sha: doc_id: 310298 cord_uid: 26x2p9wc using the complete genome sequences of 35 classical swine fever viruses (csfv) representing all three genotypes and all three kinds of virulence, we analyzed synonymous codon usage and the relative dinucleotide abundance in csfv. the general correlation between base composition and codon usage bias suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage bias in csfv. furthermore, we observed that the relative abundance of dinucleotides in csfv is independent of the overall base composition but is still the result of differential mutational pressure, which also shapes codon usage. in addition, other factors, such as the subgenotypes and aromaticity, also influence the codon usage variation among the genomes of csfv. this study represents the most comprehensive analysis to date of csfv codon usage patterns and provides a basic understanding of the mechanisms for codon usage bias. electronic supplementary material: the online version of this article (doi:10.1007/s11262-008-0296-z) contains supplementary material, which is available to authorized users. synonymous codons are not used randomly. rather, some codons are used more frequently than others. mutational pressure and translational selection were thought to be the main factors that account for codon usage variation among genes in different organisms [1] [2] [3] [4] . understanding the extent and causes of biases in codon usage is essential to the understanding of viral evolution, particularly the interplay between viruses and the immune response [5] . however, in contrast to many organisms such as bacteria, yeast, drosophila, and mammals, where codon usage bias and nucleotide composition have been studied in great detail [6] , the factors shaping synonymous codon usage bias and nucleotide composition in viruses, especially in animal viruses, have been studied only to a limited extent. for human rna viruses, it has been observed that codon usage bias is related to mutational pressure, g ? c content, the segmented nature of the genome and the route of transmission of the virus [7] . for some vertebrate dna viruses, genome-wide mutational pressure, rather than natural selection for specific coding triplets, is the main determinant of codon usage [5] . analysis of the bovine papillomavirus type 1 (bpv1) late genes has revealed a relationship between codon usage and trna availability [8] . in the mammalian papillomaviruses, it has been proposed that differences from the average codon usage frequencies in the host genome strongly influence both viral replication and gene expression [9] . codon usage may play a key role in regulating latent versus productive infection in epstein-barr virus [10] . recently, it was reported that codon usage is an important driving force in the evolution of astroviruses and small dna viruses [11, 12] . clearly, studies of synonymous codon usage in viruses can reveal much about the molecular evolution of viruses or individual genes. such information would be relevant in understanding the regulation of viral gene expression. to date, little codon usage analysis has been performed on classical swine fever virus (csfv), which is the pathogen that causes classical swine fever (csf), an economically important and highly contagious disease of swine. although eradicated from many countries, csf continues to cause serious problems in different parts of the world [13] . csfv is an enveloped virus with a single stranded rna genome, which contains a single open reading frame (orf) encoding a polyprotein that, following cellular and viral proteasemediated co-and post-translational processing, gives rise to 11-12 final cleavage products [14] . studies on the phylogenetic relationship of csfvs have divided the viruses into 3 main genotypes and 10 subgenotypes based on sequence comparisons of 190 nt of e2 sequence [15] . based on differences in virulence, csfvs can also be divided into three clusters, namely, highly virulent strains, moderately virulent strains, and avirulent strains [16] . recently, we have analyzed the positive selection pressure acting on the csfv envelope protein genes, e rns , e1, and e2, and identified several specific codons subject to diversifying positive selection in e rns and e2 [17] . in order to better understand the characteristics of the csfv genome and to reveal more information about the viral genome, we have analyzed the codon usage and dinucleotide composition. in this report, we sought to address the following issues concerning codon usage in csfv: (i) the extent and causes of codon bias in csfv; (ii) the relationship between csfv genotype and codon usage; and (iii) how csfv virulence might affect codon usage. three complete genomes of csfv were previously sequenced by our laboratory (af407339, af091507, and af092448) [18, 19] . the other available complete cds of csfv were downloaded from genbank in march 2008 and sequences with [99% sequence identities were excluded. a total of 35 csfv genomes [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] representing 6 subgenotypes (1.1, 1.2, 2.1, 2.2, 2.3, and 3.4) and all 3 kinds of virulence (highly virulent strains, moderated virulent strains, and avirulent strains) were used in this study. the genotyping of 35 csfv genomes was performed using the csfv sequence database (http://viro08.tiho-hannover. de/eg/eurl_virus_db.htm) based on 190 nt of e2 sequence [34] . the serial number (sn), mononucleotide composition of each genome, genbank accession numbers, subgenotype, virulence, and other detail information are listed in table 1 . relative synonymous codon usage (rscu) values of each codon in each orf were used to measure the synonymous codon usage [35] . rscu values are largely independent of amino acid composition and are particularly useful in comparing codon usage between genes, or sets of genes that differ in their size and amino acid composition. the effective number of codons (enc) was used to quantify the codon usage bias of an orf [36] , which is the best overall estimator of absolute synonymous codon usage bias [37] . the enc values range from 20 to 61. the larger the extent of codon preference in a gene, the smaller the enc value is. in an extremely biased gene where only one codon is used for each amino acid, this value would be 20; in an unbiased gene, it would be 61. the index gc3s was used to calculate the fraction of the nucleotides g ? c at the synonymous third codon position (excluding met, trp, and the termination codons). similarly, gc12s is the fraction of the nucleotide g ? c at the synonymous first and second positions. the general average hydrophobicity (gravy) score and the frequency of aromatic amino acids (aromo) in the hypothetical translated gene product were also computed. all the indices mentioned above were calculated using the program codonw, version 1.4. the relationships between variables and samples can be explored using multivariate statistical analysis. correspondence analysis (coa) was used to study the major trend in codon usage variation among orfs. in order to minimize the effects of amino acid composition on codon usage, each orf is represented as a 59-dimensional vector; each dimension corresponds to the rscu value of one sense codon (excluding aug, ugg, and stop codons). major trends within this dataset can be determined using measures of relative inertia and genes ordered according to their positions along the axis of major inertia. the relative abundance of dinucleotides in the csfv orfs was assessed using the method described by karlin and burge [38] . the odds ratio q xy = f xy /f x f y , where f x denotes the frequency of the nucleotide x and f xy the frequency of the dinucleotide xy, etc., for each dinucleotide were calculated. as a conservative criterion, for p xy [ 1.23 (or .78), the xy pair is considered to be of high (or low) relative abundance compared with a random association of mononucleotides [38] . statistical analysis correlation analysis was carried out using spearman's rank correlation analysis method. all statistical analyses, as well as cluster analysis, were carried out using the statistical analysis software spss version 15.0. in order to investigate the extent of codon bias in csfv, the rscu values of different codon in each orf was to investigate synonymous codon usage variation among csfv viruses, coa was implemented for all 35 csfv orfs selected for this study. figure 1 depicts the position of each orf on the plane defined by the first and second principal axes generated by coa on rscu values of orfs. the first principal axis accounts for 36.87% of the total variation. the next three axes account for 19.54%, 8.79%, and 7.54% of the variation, respectively. this observation indicates that although the first major axis explains a substantial amount of variation in trends in codon usage, the second major axis also has an appreciable impact on total variation in synonymous codon usage. it is worth noting that several csfv chinese c strains that can replicate efficiently in rabbits but not in swine have similar coordinates (fig. 1) to two csfv riems strains, which can replicate efficiently in swine. this suggests that the host may not influence the codon usage bias between the csfv c strain and other csfv strains. in fact, our study demonstrated that a 12-nt insertion (cuuuuuucuuuu) at position 61 of 3 0 utr may be responsible for the characteristics of the csfv chinese c strain [44] . mutational pressure is the main factor accounting for codon usage variation in csfv mutational pressure and translational selection are thought to be the main factors that account for codon usage variation in different organisms [1] [2] [3] [4] . hence, in order to establish which factor in csfv can explain their codon usage, first, the g ? c content at the first and second codon positions (gc12s) was compared with that at the synonymous third position (gc3s). it was found that gc12s and gc3s are significantly correlated (r = 0.483, p \ 0.01). this suggests that they are most likely the result of mutational pressure, as natural selection would be expected to act differently on different codon positions. additionally, wright [36] suggested that the enc-plot (enc plotted against gc3s) be used as part of a general strategy to investigate patterns of synonymous codon usage. genes, whose codon choice is constrained only by a g ? c mutation bias, will lie on or just below the curve of the predicted values. as shown in fig. 2 , all of the spots lie below the expected curve, indicating that the codon usage bias in these 35 genomes is greatly influenced by the g ? c compositional constraints. furthermore, the correlation between the first or second axis values in coa and gc12s or gc3s values of each strain was analyzed. as shown in table 4 , the first axis value in coa of each selected genome, which contains most of the variation in synonymous codon usage bias between these genomes, is closely correlated with the gc composition at the first, second, and third codon position. the second axis in the coa of each gene is also closely correlated with the gc12s. this analysis indicated that most of the codon usage bias among different orfs is directly related to the nucleotide composition. therefore, the compositional constraint is the main determinant of the variation in synonymous codon usage among different csfv orfs. the relative abundance of dinucleotide and cpg suppression also shape the codon usage in csfv it has been reported that dinucleotide biases can affect codon bias. to study the possible effect of the composition of dinucleotides on codon usage in csfv, the relative abundances of the 16 dinucleotides in the 35 csfv genomes were calculated. as shown in table 2 , the frequencies of occurrence for dinucleotides were not randomly distributed and no dinucleotides were present at the expected frequencies. the relative abundance of cpg showed the most marked deviation from the ''normal range'' (mean ± s.d. = 0.426 ± 0.018). the relative abundance of upg and cpc also showed slight deviation from the ''normal range'' (mean ± s.d. = 1.250 ± 0.018 and 1.262 ± 0.019, respectively). among the 16 dinucleotides, 6 are correlated with the first axis value in coa; 8 are correlated with the second axis value in coa (table 3 ). these observations indicated that the composition of dinucleotides, which are independent of the overall base composition but still the result of differential mutational pressure, also determines the variation in synonymous codon usage among different csfv orfs. in the rest four cpc containing codons for proline, cca (mean 1.520) is markedly over-used; ccg (mean 0.676), which also is a cpg containing codon, is slightly suppressed; ccu (mean 0.933) and ccc (mean 0. 871) are almost equally used. the effect of selection pressure on codon usage as shown in fig. 2 , the majority of the actual enc values are slightly lower than the expected enc values. this implies that although codon bias is mainly explained by mutational pressure, there are other factors, with less of an effect, that also influence the codon bias. to test that whether any selection pressure contributes to the codon usage variation between these csfvs, we performed a correlation analysis between axis values in coa and aromaticity or gravy score of each polyprotein. it was found that both axis 1 and axis 2 are significantly correlated with the aromaticity score (r = -0.526, p \ 0.01, r = 0.473, p \ 0.01, respectively), indicating that the frequency of aromatic amino acids (phe, tyr, trp) in the hypothetical translated gene product of each orf is also related to the observed variation in codon bias. no significant relationship was found between axis values in coa and gravy using spearman's correlation ( table 4 ). beyond the factors mentioned above, we were also concerned with how csfv genotype and virulence might affect codon usage. based on the variation in rscu values among the 35 csfv genomes, a cluster tree was generated by the hierarchical clustering method. as shown in fig. 3 , these 35 csfv genomes were divided into 7 sublineages. sublineages i-1 and i-2 contain all subgenotype 1.1 strains, and sublineage i-2 contains almost all avirulent strains in genotype 1.1. sublineages i-3, ii-1, ii-2, ii-3, and ii-4 contain the subgenotypes 1.2, 2.1, 2.3, 3.4, and 2.2, respectively. it should be noted that the distance between sublineages ii-2 and ii-3 is closer than the distance between sublineages ii-2 and ii-4 (fig. 3) . since sublineages ii-2 and ii-4 contain the subgenotypes 2.3 and 2.2, respectively, which, in turn, belong to genotype 2, the distance between two sublineages is closer than the distance between sublineage ii-2 and sublineage ii-3 (contains the subgenotype 3.4). this may be because of the special characteristics of strain 39 in subgenotype 2.2 (see discussion). mean values of 35 csfvs relative dinucleotide ratios ± s.d table 3 summary of correlation analysis between the first two axes in coa and sixteen dinucleotides in the selected viruses , indicating that the overall extent of codon usage bias in csfv genomes is low. in fact, jenkins et al. [7] have previously reported that the overall extent of codon usage bias in rna viruses is low with an average enc value close to 45. nevertheless, we still wished to determine the factors that constrain codon usage in csfv. according to the selection-mutation-drift model [35, 45] , mutational pressure and translational selection are generally thought to be the main factors that account for codon usage variation between genes in different organisms [1] [2] [3] [4] . in our study, the general correlation between codon usage bias and base composition we observed suggests that mutational pressure is the main factor that determines codon usage bias in csfv; this conclusion is also supported by the highly significant correlation between gc12s and gc3s (r = 0.483, p \ 0.01), and the result of enc-plot (fig. 2 ). since mutation rates in rna viruses are much higher than those in dna viruses [46] , it is understandable that mutational pressure is the major cause of codon usage bias in the 35 csfv strains included in this study. the majority of the actual enc values are slightly lower than the expected enc values (fig. 2) , indicating that there are other factors, albeit with smaller effects, that also influence codon bias. we then asked how csfv genotype and virulence might affect codon usage. our cluster analysis revealed that the csfv genotype also constrains codon usage, since different csfv strains with the same genotype were clustered together with only one exception, csfv strain 39 (fig. 3) . csfv strain 39 (af407339) was, however, postulated to be a recombinant virus by he et al. [47] . to date phylogenetic analyses have been performed largely on one or three genomic regions but not the complete genome, which might limit it to genotype recombinant viruses. on the other hand, our rscu-based cluster was based on the complete cds of each virus. therefore, it is expected that differences will arise between phylogenetic analyses of recombinant viruses using the two different clustering methods. our results suggest that csfv strain 39 might indeed be a recombinant virus and also raised interesting questions about csfv evolution and the relative contribution of intertypic recombination to the generation of csfv genetic diversity. furthermore, our results indicate that virulence is not significantly influenced by codon bias, since not all avirulent strains were clustered together. although 9 of the 11 avirulent strains of subgenotype 1.1 were clustered together ( fig. 3 subgenotype 1.1b) , the other avirulent strains were clustered with highly virulent strains, and 5 moderately virulent strains were also not clustered together (fig. 3) . at present, however, only small numbers of complete cds of csfv are available, and these only six cover subgenotypes. clearly, more complete sequences are needed to allow us to make more precise judgments. due to a previous report about cpg under-representation in rna and small dna viruses [10] , we wanted to determine if the relative abundances of dinucleotides in csfv affects codon usage. the frequencies of occurrence for dinucleotides were not randomly distributed and no dinucleotides were present at the expected frequencies ( table 2 ). the general correlation between the axis values in coa and the relative dinucleotide abundances (table 3) suggests that codon usage in csfv can also be strongly influenced by underlying biases in dinucleotide frequencies. as a case in point, all cpg containing codons are markedly suppressed. the marked cpg deficiency is a common phenomenon in small eukaryotic viruses [48, 49] . the cpg deficiency was proposed to be related to the immunostimulatory properties of unmethylated cpgs, which were recognized by the host's innate immune system as a pathogen signature [5, 49] . indeed, unmethylated cpg motifs in dna sequences can be recognized by tlr9 [50] , and unmethylated cpg motifs in ssrna may stimulate monocytes through a novel mechanism [51] . this notion was further supported by the fact that cpg is not suppressed in the genomes of most large viruses [48, 49] because they might encode a range of proteins that interfere with cellular pathogen recognition. as a case in point, vaccinia poxvirus encodes agonists of tlrs [52] . in csfv, ruggli et al. and our group have shown that n pro and e rns protein can prevent both poly(ic)-and ndv-mediated ifn-a/b induction [53] [54] [55] [56] . inhibition by n pro protein is thought to involve an inactivation of interferon regulatory transcription factor 3 (irf-3) [57] . however, no evidence has been found to support the notion that n pro and e rns proteins interfere with ssrna through the recognition of unmethylated cpg motifs. it is most likely that the codon usage bias in csfv may be also related to its host's innate immune selective forces. taken together, our study reveals that codon usage bias in csfv is slight and mutational pressure is the main factor that affects codon usage variation in csfv. other factors, such as dinucleotide composition, genotype, aromaticity, and even innate immune selective forces also significantly influence codon usage bias. however, due to a lack of sequence data and detailed information about these isolations, it is currently impossible to performance an exhaustive analysis about csfv codon usage. clearly, a more comprehensive analysis is needed, based on more available data, to reveal more about the viral genome. to our knowledge, this work is the first report of codon usage analysis in csfv, and it provides a basic understanding of the mechanisms that give rise to codon usage bias. the results we have reported are also useful in understanding the processes involved in csfv evolution. the viruses and their replication proc. natl. acad. sci. usa 96 key: cord-025704-icedihm2 authors: pawestri, hana a.; nugraha, arie a.; han, alvin x.; pratiwi, eka; parker, edyth; richard, mathilde; van der vliet, stefan; fouchier, ron a. m.; muljono, david h.; de jong, menno d.; setiawaty, vivi; eggink, dirk title: genetic and antigenic characterization of influenza a/h5n1 viruses isolated from patients in indonesia, 2008–2015 date: 2020-06-01 journal: virus genes doi: 10.1007/s11262-020-01765-1 sha: doc_id: 25704 cord_uid: icedihm2 since the initial detection in 2003, indonesia has reported 200 human cases of highly pathogenic avian influenza h5n1 (hpai h5n1), associated with an exceptionally high case fatality rate (84%) compared to other geographical regions affected by other genetic clades of the virus. however, there is limited information on the genetic diversity of hpai h5n1 viruses, especially those isolated from humans in indonesia. in this study, the genetic and antigenic characteristics of 35 hpai h5n1 viruses isolated from humans were analyzed. full genome sequences were analyzed for the presence of substitutions in the receptor binding site, and polymerase complex, as markers for virulence or human adaptation, as well as antiviral drug resistance substitutions. only a few substitutions associated with human adaptation were observed, a remarkably low prevalence of the human adaptive substitution pb2-e627k, which is common during human infection with other h5n1 clades and a known virulence marker for avian influenza viruses during human infections. in addition, the antigenic profile of these indonesian hpai h5n1 viruses was determined using serological analysis and antigenic cartography. antigenic characterization showed two distinct antigenic clusters, as observed previously for avian isolates. these two antigenic clusters were not clearly associated with time of virus isolation. this study provides better insight in genetic diversity of h5n1 viruses during human infection and the presence of human adaptive markers. these findings highlight the importance of evaluating virus genetics for hpai h5n1 viruses to estimate the risk to human health and the need for increased efforts to monitor the evolution of h5n1 viruses across indonesia. electronic supplementary material: the online version of this article (10.1007/s11262-020-01765-1) contains supplementary material, which is available to authorized users. highly pathogenic avian influenza (hpai) viruses are a global concern for both animal and human health [1] . hpai h5n1 viruses of the a/goose/guangdong/1/1996 lineage were first discovered in 1996 in china and since have continued to circulate in poultry and wild birds across asia, the middle east, europe and africa [2, 3] . in total, 68 countries have been affected and millions of birds have succumbed to the disease or have been culled to prevent further spread of the disease [4] . hpai h5n1 viruses infect humans sporadically and may cause severe disease with a high case fatality rate among confirmed hospitalized patients. to date, there are 861 confirmed human cases, of which 455 have died [5] . hpai h5n1 viruses continue to evolve through genetic drift and reassortment events with other avian influenza a viruses, resulting in multiple genetic clades and subtypes [6, 7] . edited by william dundon. the online version of this article (https ://doi.org/10.1007/s1126 2-020-01765 -1) contains supplementary material, which is available to authorized users. hpai h5n1 viruses were first identified in indonesia from poultry outbreaks on the java island in 2003 and had since spread to other parts of the country [8, 9] . during subsequent years, clade 2.1 viruses became enzootic in indonesia [10] . however, a new hpai h5n1 clade 2.3 virus was detected during poultry outbreaks since 2012. 200 human infections with hpai h5n1 viruses have been reported in indonesia so far with higher case fatality rates among reported cases (84%) compared to the rest of the world afflicted by the virus. we previously showed that high nasopharyngeal viral load was associated with more severe outcome of human h5n1 infections in indonesia and that the virus was more commonly detected in blood relative to other geographical regions affected by hpai h5n1 [11, 12] . strikingly, although the number of detections in humans peaked in [2005] [2006] in indonesia and subsequently declined in the following years, the case fatality rate in indonesia increased over time from 65% in 2005 to 100% since 2012. this increase was associated with higher viral load prior to treatment and the presence of mutations in the matrix protein that confers adamantane resistance [11] . however, reasons for the higher viral load and case fatality rate are still unclear. more detailed sequence analyses are warranted to investigate the presence of known virulence markers and substitutions related to possible human adaptation, which can help explain the higher viral loads and increased mortality. in response to the outbreaks, the indonesian government implemented a strategy to reduce the incidence of hpai h5n1 virus infections in poultry including stamping out of infected poultry, culling of contiguous flocks and poultry vaccination [13] . several vaccines were developed and implemented to match the circulating strain in the poultry and pandemic preparedness over the time [14, 15] . the initial vaccine used was based on the a/chicken/ legok/2003 isolate, a clade 2.1.1 virus [13, 16] . by 2010, the vaccine strain was updated to subclade 2.1.2 and 2.1.3 viruses, based on isolates a/chicken/west java/30/07 and a/chicken/nagrak/30/07, respectively [17] . to date, a new subclade of h5n1 has emerged (2.3.2.1) and a new vaccine was developed based on isolate a/duck/sukoharjo/bbvw-1428-9/2012 [10] . however, as a consequence of the largescale vaccination, antigenic drift was induced in poultry and consequently the vaccines became less effective [18] . despite vaccination efforts, the number of poultry outbreaks remained high and the epidemic in poultry continued to spread among 32 out of 34 indonesian provinces with over 11,000 reported poultry outbreaks since 2007 [19] . the large number of poultry outbreaks continues to pose a threat for future zoonotic infections in humans, antigenic drift and possible host adaptations that could increase the pandemic risk of circulating viruses [20, 21] . improved insights into the genetic and antigenic characteristics of hpai h5n1 viruses from indonesia provide a better understanding of its epidemiology, the high case fatality rate and for a pandemic risk assessment [22] . sequencing data contain valuable information about viral genetic characteristics, including presence of known human adaptive markers, resistance against available antiviral drugs or other changes that can explain the high and rising mortality, while antigenic characterization will help assess the potential protection of pre-pandemic vaccines. here, we conducted a study to characterize the viral genetics of hpai h5n1 viruses isolated from patients in indonesia between 2008 and 2015 that could explain the virulence leading to the high case fatality rate. we investigated the presence of known molecular determinants of virulence, receptor binding properties and antiviral susceptibility using whole genome sequencing of the hpai h5n1 viruses. in addition, we investigated the antigenic properties of these human virus isolates and compared them to previous antigenic changes in hpai h5n1 viruses from poultry, in order to assess the usefulness of and protection by current available pre-pandemic virus vaccines. as part of the national procedure for avian influenza case investigation in indonesia, respiratory specimens were collected from suspected h5n1 cases admitted to hospitals throughout indonesia and sent to the national reference laboratory for influenza at the national institute of health research and development (nihrd) in jakarta. suspected cases were defined according to world health organization (who) criteria [23] . the nihrd is the reference laboratory under the indonesian ministry of health responsible for laboratory testing and event-based surveillance of emerging infectious diseases in humans, including avian influenza a/h5n1 virus. because indonesian clinical specimens are obtained from suspected h5n1 cases as part of the national outbreak procedure for hpai h5n1 case investigations, requirement for informed consent has been waived by the indonesian ministry of health. the specimens and data were collected from january 2008 to december 2015, according to the national outbreak investigation protocol following circulation of hpai h5n1 viruses in south east asia [24] . all of the specimens collected were stored and analyzed at the nihrd. laboratory identification and confirmation was determined using realtime reverse transcriptase-polymerase chain reaction (rt-pcr) typing and subtyping assay according to the centers for disease control (cdc) (atlanta, united states) protocol [25] . specimens for all laboratory-confirmed cases were selected for subsequent virus isolation and genetic analyses, based on the specimen with the lowest cycle threshold (c t ) value according to the real-time rt-pcr, available for each patient. the 71 selected specimens positive for influenza a(h5n1) virus with a ct value below 35 were grown in 9-to 10-dayold specific pathogen-free (spf) embryonated chicken eggs in a biosafety level 3 (bsl3) facility [26] . after incubation at 37 °c for 30 h, the egg allantoic fluid was harvested and hemagglutination titers were determined by hemagglutination assay. a total of 35 positive cultures were obtained due to variable specimen quality and limited availability of specimen volumes. the viral rna was extracted from 200 µl of influenza virus positive allantoic fluids using high pure rna isolation kit (roche) with on-column dnase treatment according to the manufacturer's instructions. the rna was reverse transcribed into cdna using uni12m primer (agc raa agc agg ) [27] using superscript iii reverse transcriptase (invitrogen, carlsbad, ca, usa) according to the manufacturer's protocol. pcr amplification was performed using gene specific whole genome degenerative primer sets (primer sequences available upon request) [28] [29] [30] using platinum taq dna polymerase high fidelity (invitrogen). the pcr products were then purified with the exosap-it purification kit (affimetrix, inc, santa clara, ca) according to the manufacturer's protocol. the complete coding sequences were sequenced using the big dye terminator v3.1 cycle sequencing kit (applied biosystem, foster city, ca, usa). the products of the sequencing reactions were cleaned using big dye x terminator kit (applied biosystem, foster city, ca, usa) according to manufacturer's instructions and sequenced in a 16-capillary 3130xl genetic analyzer (applied biosystem, foster city, ca, usa). all nucleotide sequences obtained from this study have been deposited in the gisaid database (see supplemental table s2 ). the assembly and editing process of sequences from all eight gene segments was performed using codon code software (gene codes, usa). all sequences were aligned using clustalw as available within bioedit software version 7.0.8.0 [31] . to infer the evolutionary relationships between the viruses, maximum likelihood (ml) phylogenetic trees were constructed using raxml 8.2.12 with the gtrgamma nucleotide substitution model [32, 33] . a ml phylogenetic tree was constructed using the combined nucleotide alignment of hemagglutinin (ha) sequences from the newly sampled viruses and reference sequences used to defined the h5 nomenclature system (https ://www.who.int/ influ enza/gisrs _labor atory /20110 1_h5sma lltre ealig nment .txt; fig. 1 ) [34, 35] . sequence data of human and avian h5n1 viruses from indonesia with all eight influenza virus gene segments (200 viral isolates as of january 2020) was downloaded from the (gisaid) epiflu database [36] . individual ml trees were reconstructed for each gene segment to compare the genetic diversity of the newly sampled viruses against those previously collected from indonesia (fig. s1 ). tanglegrams were visualized using the baltic toolkit (https ://githu b.com/evogy tis/balti c). amino acid sequences were analyzed to identify substitutions potentially linked to human adaptation, virulence, antiviral resistance and antigenic properties as listed in the cdc h5n1 genetic change inventory [37] . in addition to this inventory, we also used flusurver to identify potentially relevant substitutions present in our sequence dataset (https ://www.gisai d.org, https ://flusu rver.bii.a-star.edu.sg). flu-surver is a web-based tool to rapidly screen the sequences for potential mutations based on the curated and published literature. virus titers were determined by hemagglutination assay and antigenic characterization was performed by hemagglutination inhibition (hi) assays according to who protocols [38, 39] . the ferret antisera specifically reactive to defined h5 hemagglutinin clades were raised as described previously [40] . all antisera were pretreated overnight at 37 °c with receptor destroying enzyme (rde vibrio cholerae neuraminidase), followed by inactivation for 1 h at 56 °c. the hi assays were performed using the following procedures: twofold serial dilutions of 50 µl antisera starting at a 1:20 were mixed with 25 µl of a virus containing 4 hemagglutinating units (hau) and were incubated at 37 °c for 30 min. then, 25 µl of 1% turkey erythrocytes was added and incubated at 4 °c for 1 h. the hi titer is determined as the reciprocal value of the highest serum dilution that completely inhibited the hemagglutination of the turkey erythrocytes. antigenic properties were determined for 25 representative novel isolates. selection was based upon available ha titer of virus stocks and availability of at least two independent replicate experiments, measuring hi titers for all available ferret sera. analysis of antigenic properties was conducted using antigenic cartography methods as described previously [40, 41] . briefly, the hi titers are converted to a distance matrix in which the distance between one antigen and one antiserum corresponds to the difference between the log 2 value of the maximum observed titer to the antiserum from any of sample collection. who reference strains are used to define the h5 nomenclature system [34, 35] antigen and the titer of the antigen to the antiserum. this distance matrix is used as input for multidimensional scaling algorithms, which arrange the antiserum and antigen points in space to best satisfy the target distances specified by the hi data by minimizing the error. therefore, the distances between the points in an antigenic map represent antigenic distance as measured by the hi assay, in which the distances between antigens (virus isolates) and antisera are inversely related to the log 2 hi titer. although only distances between antigens and antisera are measured in the hi assay, antigenic maps allow the indirect measure of antigenic distances between two viruses. during the course of this study between 2008 and 2015, over 8000 poultry outbreaks of hpai h5n1 viruses were reported [42] and 82 cases of laboratory-confirmed human hpai h5n1 virus infection were collected. of these 82 cases, we successfully cultured 35 virus isolates for genetic and antigenic characterization. table 1 shows a summary of the epidemiological and other data of these 35 cases. among the 35 patients, the median age was 21 (range 2-40), 14 (60%) were male, and 21 (40%) were female, and 18 (51%) received oseltamivir treatment. specimens were collected at median 8 days post onset of symptoms (range 0-17). there were a total of 6 specimens collected in 2008, 6 specimens in 2009, 5 specimens in 2010, 5 specimens in 2011, 5 specimens in 2012, 5 specimens in 2013, 2 specimens in 2014 and 2 specimens in 2015. these samples were collected from regions with high incidence of poultry h5n1 outbreaks [16, 21] , including west java (31%), followed by jakarta (26%) and the banten province (14%). a ml phylogenetic tree based on hemagglutinin (ha) sequences was constructed to infer the evolutionary relationships between the newly isolated viruses and hpai h5n1 viruses circulating globally (see "materials and methods" (fig. 1) . to further elucidate the phylogenetic relationships between the novel h5n1 viruses and those collected from indonesia previously, ml phylogenetic trees were constructed for each individual influenza virus gene segment (fig. s2 ). there were also no clear distinct phylogenetic groupings between human and avian viruses in any of the gene segment analyzed, indicating that viruses infecting both host types in indonesia were genetically similar (fig. s1d ). next, we compared the amino acid substitutions found in these newly isolated viruses against molecular markers known to alter viral phenotypes such as virulence, drug resistance and human host adaptation (table s1 ) [43] [44] [45] [46] [47] . the ha protein can affect the virulence and host range of hpai h5n1 viruses due to (1) the presence of a multibasic cleavage site, as well as changes to (2) host cell receptor specificity, (3) n-linked glycosylation patterns and (4) ha stability. the pathogenicity of avian influenza viruses is determined by the cleavability of the ha glycoprotein. the presence of multiple basic amino acid residues at the cleavage site of ha allows the glycoprotein to be cleaved into mature subunits ha1 and ha2 by furin-like proteases, which are ubiquitously expressed. to the contrary, ha containing a single basic residue are cleaved by trypsin-like proteases, predominantly expressed in the respiratory and intestinal tract of birds and the respiratory tract of humans. all of the new isolates analyzed in this study are highly pathogenic avian influenza viruses that encode a multibasic ha cleavage site [48, 49] . the cleavage site motif pqresrrkkr↓g was found in 30 of the 35 newly isolated viruses while other variations (i.e., pqregrrkkr↓g, pqreskrkkr↓g, pqresrrrkr↓g and pqresrrkrr↓g) were observed in the remaining isolates. another key feature of ha related to both virulence and human adaptation is the receptor specificity. conserved residues within the receptor binding site (rbs) of ha are required for binding to sialic acid receptors (sia), while several other residues in domains surrounding the rbs are key determinants of receptor specificity. residues in these domains, the 130-loop, 190-helix and the 220-loop, determine the specificity for either the avian-type receptor or human-type receptor, α2-3-linked sia or α2-6-linked sia, respectively. several key residues within these domains have been identified at positions including 186, 190, 193, 224, 226 and 228 (h3 numbering). high conservation of amino acid sequences was found at the receptor binding site (rbs). all isolates possessed a conserved residue at position n186, e190, n224, q226 and g228, the most apparent residues involved in receptor binding specificity (as reviewed in [50] ), indicating preferential binding of the viruses to avian like α2-3-linked sia [51] . interestingly, polymorphism was observed at position 193 for which a methionine or isoleucine was observed, instead of the more common arginine or lysine. however, the exact role in receptor specificity for this residue needs to be determined [52, 53] . n-linked glycosylation of the influenza virus ha protein plays important roles in protein folding and modulates virus pathogenicity and evasion of neutralizing antibodies [54, 55] . in addition, glycans within the vicinity of the rbs region may alter receptor binding affinity and/or specificity. like other clade 2.1 viruses, the h5n1 viruses from indonesia contain seven potential n-linked glycosylation sites. n-linked glycosylation at positions 14, 15, 27, 290 and 488 are highly conserved among many ha subtypes [56] . in addition, h5n1 viruses can contain n-linked glycosylation sites at positions 158 and 169. the absence of glycosylation site 158 was linked to human receptor specificity and affinity, and aerosol transmission in ferrets, an animal model representative for aerosol transmission between humans [57, 58] . however, no substitution (i.e., n158x or t160x) removing this glycosylation site was observed in the new indonesian samples. besides changes to receptor specificity and glycosylation patterns, the protein stability of ha is also important for human host adaptation, transmission and possibly virulence [46] . nonetheless, we did not find in any of the 35 virus isolates any ha substitutions (i.e., h103y, t315i [57, 58] and y351h, h352q, an k387i (h3 numbering) [59, 60] that are known to increase replication and virulence of avian influenza virus h5n1 or h7n9 in mammalian animal models by mediating ha protein stability. however, it is expected that other positions and substitutions within the ha trimer could affect stability and therefore be involved in human adaptation and transmissibility of h5n1 viruses, which would require further research to identify. all of the novel hpai h5n1 isolates were found to contain the deletion of amino acids between positions 49 and 68 in the stalk region of its na glycoprotein. this shorter stalk length of na was previously linked to increase virulence of h5n1 viruses in mammals [61] [62] [63] . furthermore, the neuraminidase (na) protein serves as a target for na inhibitors (nai) such as oseltamivir, zanamivir, peramivir and laninamivir, which block the na enzyme active site to limit influenza virus egress. eighteen of 35 patients were treated with oseltamivir in our study. however, none of the newly isolated viruses encoded known nai resistance mutations (i.e., v116a, i117v, e119v, g136k, v149a, r156k, d198n, s246n, h275y, r293k, n295s (n1 numbering)) [64] . this corresponds with our earlier study showing that acquisition of nai resistance is extremely rare in h5n1-infected individuals in indonesia, be it before or during treatment [11] . of note, q136h was observed in 14 of the 35 isolates. although q136l is associated with reduced sensitivity to zanamivir and oseltamivir, q136h had no effect on sensitivity to nai when tested in h1n1pdm2009 or h3n2 [65] . we previously showed that substitutions related to amantadine resistance are common in h5n1 viruses in indonesia [11] even though amantadine treatment is not administered anymore. the prevalence of amantadine resistance-related substitutions increased over time from 37.5% in 2005, to 95% in 2009 and 100% during subsequent years [11] . various amantadine resistance substitutions in the m2 protein were also found in all 35 isolates, including v27a (34 viruses), v27t (1 virus), s31g (1 virus), and s31n (5 viruses). interestingly, isolates in this study collected in more recent years often encode resistance mutations in both positions 27 and 31. these results indicate that indonesian h5n1 viruses are sensitive to na inhibitors but resistant to m2 inhibitor, despite the absence of amantadine treatment [11] . besides receptor specificity, polymerase activity is known to be a hallmark for host adaptation and virulence. the polymerase complex is a heterotrimer that consists of the pb2, pb1 and pa subunits. the pb2 protein is an important determinant of virulence and host range. pb2 substitutions such as e627k and d701n [12] can dramatically increase polymerase activity of avian influenza viruses in mammalian cells. in particular, pb2-e627k is a key molecular determinant of host range [66] and a virulence factor during human infection with hpai h5n1 [67] . both pb2-e627k (5 of 35 viruses) and pb2-d701n (1 virus) substitutions were observed in a limited number of the novel viruses presented in this study. this is in strong contrast with human h5n1 viruses collected in other geographic regions where pb2-e627k substitution is common [68] . we also checked if there are other known substitutions found in the polymerase complex and nucleoprotein that enhance polymerase activity of avian influenza viruses in human cells reported in previous studies [69] . while there are some genetic variations present in some of these positions, there were no obvious markers of human adaptation or virulence that could be linked to the high case fatality rate of h5n1 infections in humans in indonesia. both pb1-f2 and ns1 have immune regulatory roles for influenza virus. the full-length pb1-f2 protein (90 aa) inhibits type i interferon response mediated by the mitochondrial antiviral signaling protein [70, 71] . however, the open reading frame (orf) of the auxiliary protein which occurs in the second orf of the pb1 gene segment can be truncated or lost [70, 72] . all of the indonesian h5n1 viruses were found to encode the full-length pb1-f2 protein. furthermore, the n66s substitution in pb1-f2 known to increase virulence [70] was not found in any of the novel viruses. on the other hand, the four-amino-acid sequence motif (esev) at the carboxyl terminus of ns1 facilitates the nonstructural protein to bind to cellular pdz-containing proteins that are involved in host cellular signaling pathways [73] . the esev motif was found in all of the new indonesian isolates. ns1 mutations such as p42s, d87e, l98f and i101m were also found to modulate the virulence of h5n1 viruses [73] . additionally, substitutions in ns1 (n200s, g205r) as well as ns2 (t47a, m51i) proteins may result in decreased antiviral responses in the host [74] . however, none of these substitutions were found in the 35 h5n1 viruses. besides the presence of virulence and human adaptive markers, it is important to monitor and understand the antigenic properties of circulating influenza viruses. vaccination is a primary measure to control or prevent h5n1 outbreaks in poultry and could be used to protect humans from h5n1 infections, should these viruses become pandemic. influenza viruses can easily escape from available vaccines by substitutions in the major antigenic sites on the globular head domain of ha [75] . here we investigated the antigenic properties of these human hpai h5n1 viruses. to characterize the antigenic diversity of indonesian human influenza h5n1 viruses by hi test, we selected a panel of ferret antisera able to detect antigenic variation between representative viruses [40] . we included ferret antisera against clade 2. we determined the antigenic properties of 25 isolates analyzed in hi assays using this panel of ferret antisera. antigenic cartography was used to visualize the antigenic relatedness of the hpai h5n1 isolates in a 2d space (fig. 2) . the antigenic map showed that the human hpai h5n1 viruses from indonesia clustered into two antigenic groups. the first group of viruses clustered around two representative antisera of clade 2. this finding is similar with a previous study, describing different antigenic clusters within avian h5n1 viruses in indonesia isolated from poultry [40] . this study identified a small number of residues immediately adjacent to the rbs within the antigenic sites in the globular head domain of ha, which are primarily responsible for antigenic changes. we investigated genetic diversity at these positions. amino acid differences were identified at these six antigenically important positions located near the receptor binding sites [40] : 129, 133, 151, 183, 185 and 189 (h5 numbering as there is no equivalent for position 129 in h3n2 viruses; table 2 ). all viruses antigenically clustering into the first antigenic group possessed residues s129, s133, i151, n183, a185 and m189, except for isolates 11,046 and 12,452 that contained t183 and i189, respectively. the n183t substitution does not seem to have an antigenic effect, although the m189i could have a small antigenic effect as indicated by the placement on the outside of the cluster. the virus isolates of cluster 2 contained residues s129, s133, i151, d183, a185 and r189, typical for a/ indonesia/5/2005 antigenic-like viruses. the study by koel et al. has previously shown that substitutions d183n and r189m are indeed responsible for the antigenic differences between these two clusters. isolate 10,364 contained a133; both axes represent antigenic distance: one square on the antigenic map represents a distance of one antigenic unit, corresponding to a twofold difference in the hi assay. the antigenic map was generated using antigenic cartography, a method that uses multidimensional scaling algorithms to place virus and antiserum points in a 2d space such that their relative position in the map reflects the hi titers with minimal error. the distance between a virus-and-antiserum pair is inversely related to the hi titer of the virus to that antiserum. the color coding of the human hpai h5n1 isolates is based on their year of isolation as depicted in fig. 1 . virus isolate names and antisera are abbreviated to isolate number/year however, this substitution does not seem to have major antigenic effects as this isolate clusters with other viruses. it was previously shown that a combination of substitutions at position 133 and 185 is necessary to result in antigenic changes [40] . these data showed that viruses belonging to two distinct antigenic groups infected humans in indonesia. interestingly, the presence of viruses from different years of isolation in both clusters (as indicated by the color coding of the viruses in fig. 2 ) indicates that these different antigenic variants were co-circulating. full protection in humans would therefore have likely required a multivalent vaccine, including at least a/indonesia/5/2005-and a/chicken/east java/121/2010-like viruses, which are currently under development or approved for human and/or poultry use [17, 76] . indonesia has suffered numerous hpai h5n1 virus outbreaks in poultry farms, live bird markets and backyard poultry, which have resulted in 200 reported human cases with a case fatality rate of over 80%. this high case fatality rate is in sharp contrast with lower case fatality rates in we recently showed that the high case fatality rate of human indonesian h5n1 cases correlated with viral load prior to treatment and increased from 65% in 2005 to 100% since 2012 [11] . we found that this high case fatality rate coincided with the high prevalence of amantadine resistance-conferring m2 substitutions; however, no mechanistic explanation for the role of such substitutions in virulence of hpai h5n1 is known yet. the aforementioned study did not include any further sequencing data looking into specific virulence and human adaptive markers. our analyses of the full genomic sequences for the 35 isolates did not indicate any potential genetic changes that might explain the increase in case fatality rate and virulence over time. we did not find any known genetic markers associated with human adaptation or virulence in the ha or polymerase complex genes. there were no changes in the rbs region of ha that was indicative of a switch towards the human-type receptor. although some genetic diversity was observed in the polymerase genes, well-known substitutions such as pb2-e627k and pb2-d701n, which are often selected upon infection of humans and affects the virulence of avian influenza viruses such as h5n1 [12, 66, 67, 79] , were not commonly found in the new samples. however, there could also be other currently unknown adaptive substitutions present in the new human samples. confirmatory investigations into the effects of these substitutions on the activity of the polymerase complexes of both human and avian hpai h5n1 viruses should be done in future studies. further sampling and research, involving the collection of more full genome sequences from avian and human viruses as well as both in vitro and in vivo characterization of virus replication and pathogenicity, are warranted to determine if indonesian hpai h5n1 viruses are indeed more virulent than h5n1 viruses of other genetic clades circulating in other geographic areas. this should also address whether there is a specific selection for more virulent viruses in humans only or whether indonesia hpai a/h5n1 viruses are more virulent in general, also in poultry in indonesia, resulting in the consequence that zoonotic events happen with more virulent viruses, resulting in higher case fatality rates. further characterization of virus isolates from different periods of time will have to show if more recent viruses are indeed more virulent and whether this could be contributed to specific molecular markers. a possible reason why human h5n1 cases have declined is the implementation of large-scale vaccination of poultry against h5n1. most countries affected by h5n1 virus outbreaks, including indonesia, have implemented poultry vaccination as a key strategy for the control of h5n1 infections. currently, the h5n1 vaccine for poultry in indonesia is an inactivated bivalent vaccine containing h5n1 viruses belonging to clades 2.3.2 and 2.1.3. as for the selection of human seasonal vaccine strains, antigenic analyses are required to understand and predict vaccine effectiveness. from the current study, antigenic analyses of human h5n1 viruses from 2008 till 2015 identified two antigenic groups of human clade 2.1.3 viruses that co-circulate in indonesia. based on a previous study by koel et al., the antigenic differences could be explained by alternative amino acids present at several key residues at the rim of the receptor binding site. no evidence of antigenic change over time was observed or association with geographical location. therefore, a combination of two of the current available pre-pandemic human h5n1 vaccines, a/indonesia/5/20 05 and a/indonesia/ nihrd11771/2011, would have been required to optimize protection against the two different antigenic groups. in summary, we performed genetic and antigenic analyses of h5n1 influenza viruses isolated from humans between 2008 and 2015. we observed low levels of genetic diversity and only sporadically prevalence of known substitutions associated with human adaptation and virulence (e.g., pb2-627k). however, the analysis only captured the majority variants and did not include the presence of minority variants present during infection. additionally, we have limited our genetic analyses to known substitutions only. to ascertain and better understand the high mortality associated with human hpai h5n1 virus infections in indonesia, it is essential to perform more in-depth analysis of genetic diversity during human infections with hpai h5n1 virus and to functionally characterize the observed substitutions. furthermore, our data showed that two antigenic groups co-circulated in indonesia, with no evidence of antigenic change over time. a combination of available pre-pandemic vaccines was required to be protective against circulating viruses of study period. global epidemiology of avian influenza a h5n1 virus infection in humans, 1997-2015: a systematic review of individual case data lessons from emergence of a/goose/ guangdong/1996-like h5n1 highly pathogenic avian influenza viruses and recent influenza surveillance efforts in southern china summary of avian influenza activity in europe world organization for animal health (oie) cumulative number of confirmed human cases of avian influenza a/(h5n1) reported to who phylogenetic clustering by linear integer programming (phy-clip) world health organization/world organisation for animal hf, agriculture organization hnewg (2014) revised and updated nomenclature for highly pathogenic avian influenza a (h5n1) viruses. influenza other respir viruses h5n1 hpai global overview overview on poultry sector and hpai situation for indonesia with special emphasis on the island of java genetic characterization of clade 2.3.2.1 avian influenza a(h5n1) viruses viral factors associated with the high mortality related to human infections with clade 2.1 influenza a/ h5n1 virus in indonesia fatal outcome of human influenza a (h5n1) is associated with high viral load and hypercytokinemia indonesia national committee for avian influenza control and pandemic influenza preparedness. national strategic plan for avian influenza control and pandemic influenza preparedness antigenic and genetic characteristics of zoonotic influenza viruses and development of candidate vaccine viruses for pandemic preparedness field effectiveness of highly pathogenic avian influenza h5n1 vaccination in commercial layers in indonesia overview on poultry sector and hpai situation for indonesia with special emphasis on the island of java antibody titer has positive predictive value for vaccine protection against challenge with natural antigenic-drift variants of h5n1 high-pathogenicity avian influenza viruses from indonesia evidence for differing evolutionary dynamics of a/ h5n1 viruses among countries applying or not applying avian influenza vaccination in poultry fao (2013) fifth report on the global programme for the prevention and control of hpai avian influenza a (h5n1) infection in humans risk factors of poultry outbreaks and human cases of h5n1 avian influenza virus infection in west java province, indonesia pandemic preparedness and the influenza risk assessment tool (irat) world health organization (2006) who guidelines for investigation of human cases of avian influenza a(h5n1) pedoman pengambilan dan pengiriman spesimen yang berhubungan dengan flu burung cdc realtime rtpcr (rrtpcr) protocol for detection and characterization of swine influenza (version antigenic variation in h5n1 clade 2.1 viruses in indonesia from universal primer set for the full-length amplification of all influenza a viruses identification, characterization, and natural selection of mutations driving airborne transmission of a/h5n1 virus genome analysis linking recent european and african influenza (h5n1) viruses genetic diversity and host adaptation of avian h5n1 influenza viruses during human infection bioedit: an important software for molecular biology raxml-iii: a fast program for maximum likelihood-based inference of large phylogenetic trees using raxml to infer phylogenies continued evolution of highly pathogenic avian influenza a(h5n1): updated nomenclature world health organization/world organisation for animal hf, agriculture organization hewg. nomenclature updates resulting from the evolution of avian influenza a(h5) virus clades 2.1.3.2a, 2.2.1, and 2.3.4 during 2013-2014. influenza other respir viruses gisaid: global initiative on sharing all influenza data: from vision to reality genetic changes inventory: a tool for influenza surveillance and preparedness studies of antigenic differences among strains of influenza a by means of red cell agglutination world health organization. manual for the laboratory diagnosis and virological surveillance of influenza antigenic variation of clade 2.1 h5n1 virus is determined by a few amino acid substitutions immediately adjacent to the receptor binding site mapping the antigenic and genetic evolution of influenza virus unggas kondisi s/d 31 oktober karlsson ea (2019) inventory of molecular markers affecting biological characteristics of avian influenza a viruses host and viral determinants of influenza a virus species specificity adaptation of avian influenza a virus polymerase in mammals to overcome the host species barrier host adaptation and transmission of influenza a viruses in mammals h5n1 genetic changes inventory: a tool for influenza surveillance and preparedness molecular pathogenesis of h5 highly pathogenic avian influenza: the role of the haemagglutinin cleavage site motif the multibasic cleavage site of the hemagglutinin of highly pathogenic a/vietnam/1203/2004 (h5n1) avian influenza virus acts as a virulence factor in a host-specific manner in mammals h5n1 receptor specificity as a factor in pandemic risk structure and receptor specificity of the hemagglutinin from an h5n1 influenza virus enhanced human-type receptor binding by ferret-transmissible h5n1 with a k193t mutation recent avian h5n1 viruses exhibit increased propensity for acquiring human receptor specificity the n-linked glycosylation site at position 158 on the head of hemagglutinin and the virulence of h5n1 avian influenza virus in mice influenza virus n-linked glycosylation and innate immunity glycosylation focuses sequence variation in the influenza a virus h1 hemagglutinin globular domain airborne transmission of influenza a/ h5n1 virus between ferrets experimental adaptation of an influenza h5 ha confers respiratory droplet transmission to a reassortant h5 ha/h1n1 virus in ferrets amino acid substitutions that affect receptor binding and stability of the hemagglutinin of influenza a/h7n9 virus amino acid residues in the fusion peptide pocket regulate the ph of activation of the h5n1 influenza virus hemagglutinin protein influenza virus neuraminidase structure and functions the neuraminidase stalk deletion serves as major virulence determinant of h5n1 highly pathogenic avian influenza viruses in chicken a 20-amino-acid deletion in the neuraminidase stalk and a fiveamino-acid deletion in the ns1 protein both contribute to the pathogenicity of h5n1 avian influenza viruses in mallard ducks summary of neuraminidase amino acid substitutions associated with reduced inhibition by neuraminidase inhibitors zanamivir-resistant influenza viruses with q136k or q136r neuraminidase residue mutations can arise during mdck cell culture creating challenges for antiviral susceptibility monitoring a single amino acid in the pb2 gene of influenza a virus is a determinant of host range molecular basis for high virulence of hong kong h5n1 influenza a viruses the effect of the pb2 mutation 627k on highly pathogenic h5n1 avian influenza virus is dependent on the virus lineage multiple polymerase gene mutations for human adaptation occurring in asian h5n1 influenza virus clinical isolates a single n66s mutation in the pb1-f2 protein of influenza a virus increases virulence by inhibiting the early interferon response in vivo influenza a virus pb1-f2 protein contributes to viral pathogenesis in mice a novel influenza a virus mitochondrial protein that induces cell death a new influenza virus virulence determinant: the ns1 protein four c-terminal residues modulate pathogenicity the ha and ns genes of human h5n1 influenza a virus contribute to high virulence in ferrets substitutions near the receptor binding site determine major antigenic change during influenza virus evolution summary of status of development and availability of a(h5n1) candidate vaccine viruses and potency testing reagents a molecular and antigenic survey of h5n1 highly pathogenic avian influenza virus isolates from smallholder duck farms in central java, indonesia during phylogenetic characterization of h5n1 avian influenza viruses isolated in indonesia from selection of h5n1 influenza virus pb2 during replication in humans