item: #1 of 119 id: cord-000257-ampip7od author: Bagowski, Christoph P title: The Nature of Protein Domain Evolution: Shaping the Interaction Network date: 2010-08-17 words: 4681 flesch: 35 summary: In this review, we aim to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks. This approach thus primarily focuses on the similarity and differences of the orthologous genes within network, and is therefore ideally suited for the study of protein domain evolution and has already revealed that species-specific parts Fig. keywords: analysis; binding; domains; evolution; expression; gene; genome; interaction; network; protein; sequence cache: cord-000257-ampip7od.txt plain text: cord-000257-ampip7od.txt item: #2 of 119 id: cord-000473-jpow6iw1 author: Astrovskaya, Irina title: Inferring viral quasispecies spectra from 454 pyrosequencing reads date: 2011-07-28 words: 5369 flesch: 49 summary: The software provided by instrument manufacturers were originally designed to assemble all reads into a single genome sequence, and cannot be used for reconstructing quasispecies sequences. Since the number of different st-paths is exponential, we wish to generate a set of paths that have high probability to correspond to real quasispecies sequences. keywords: candidate; mismatches; quasispecies; reads; sequences; sequencing; shorah; vispa cache: cord-000473-jpow6iw1.txt plain text: cord-000473-jpow6iw1.txt item: #3 of 119 id: cord-000642-mkwpuav6 author: Moreira, Rebeca title: Transcriptomics of In Vitro Immune-Stimulated Hemocytes from the Manila Clam Ruditapes philippinarum Using High-Throughput Sequencing date: 2012-04-19 words: 6864 flesch: 41 summary: Hits to R. philippinarum sequences were represented in a Venn diagram. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum. keywords: analysis; bivalves; clam; contigs; expression; factor; genes; immune; philippinarum; proteins; recognition; response; ruditapes; sequences; species; transcriptome cache: cord-000642-mkwpuav6.txt plain text: cord-000642-mkwpuav6.txt item: #4 of 119 id: cord-001340-kqcx7lrq author: Ladner, Jason T. title: Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing date: 2014-06-17 words: 2513 flesch: 34 summary: Despite the small sizes of viral genomes, complications related to limited RNA quantities, host contamination, and secondary structure mean that it is often not time-or cost-effective to finish every genome, and given the intended use, finishing may be unnecessary (5) . One of the most common and important applications for viral genomes is in the study of viral epidemiology, which encompasses our understanding of the patterns, causes, and effects of disease. keywords: characterization; coverage; genome; sequences; sequencing; viruses cache: cord-001340-kqcx7lrq.txt plain text: cord-001340-kqcx7lrq.txt item: #5 of 119 id: cord-001537-i34vmfpp author: Lima, Francisco Esmaile de Sales title: Genomic Characterization of Novel Circular ssDNA Viruses from Insectivorous Bats in Southern Brazil date: 2015-02-17 words: 3883 flesch: 46 summary: Sequence analyses were performed with the BLASTX software (http://www.ncbi.nlm.nih.gov/blast/). Pan-reactive primers were used targeting the conserved rep region of circoviruses and cycloviruses to screen DNA bat fecal samples. keywords: batcv; bats; cap; circoviruses; cyclovirus; dna; genomes; rep; samples; sequences cache: cord-001537-i34vmfpp.txt plain text: cord-001537-i34vmfpp.txt item: #6 of 119 id: cord-001786-ybd8hi8y author: Dutilh, Bas E title: Metagenomic ventures into outer sequence space date: 2014-12-15 words: 2194 flesch: 37 summary: However, it remains an open question, what is the actual size of biological sequence space? However, it remains an open question, what is the actual size of biological sequence space? keywords: metagenomics; sequence; sequencing; space; unknowns cache: cord-001786-ybd8hi8y.txt plain text: cord-001786-ybd8hi8y.txt item: #7 of 119 id: cord-001835-0s7ok4uw author: None title: Abstracts of the 29th Annual Symposium of The Protein Society date: 2015-10-01 words: 138771 flesch: 38 summary: In conclusion, the analysis of hydropathic environments strongly suggests that the orientation of a residue in a three-dimensional structure is a direct consequence of its hydropathic environment, which leads us to propose a new paradigm, interaction homology, as a key factor in protein structure. In computer simulation modeling of protein structure in a solvent medium, explicit, implicit, effectivemedium, approaches are often adopted to incorporate the effects of solvation. keywords: acid; activation; activity; addition; affinity; amino; amyloid; analysis; antibodies; antibody; antigen; approach; assay; assembly; associated; bacterial; binding; biology; bonds; cancer; catalytic; cell; cellular; chain; changes; characterization; chemical; chemistry; coli; complex; computational; concentration; conditions; conformation; conserved; control; core; cross; crystal; crystal structure; data; department; determine; development; dimer; disease; disordered; disordered proteins; dna; docking; domain; drug; effect; energy; enzyme; essential; experiments; expression; factors; family; fluorescence; fluorescent protein; formation; forms; fragments; free; functions; gene; group; helix; human; hydrogen; hydrophobic; important; increase; inhibitors; institute; key; kinetic; level; ligand; light; like; lipid; loop; low; major; mass; mechanism; membrane protein; method; model; modification; molecular; molecules; motif; mutant; mutations; n protein; native; nature; new; nmr; non; novel; number; oligomers; order; pathways; peptide; potential; prediction; presence; present; process; processes; properties; protease; protein; protein aggregation; protein association; protein complexes; protein concentration; protein data; protein degradation; protein design; protein domain; protein dynamics; protein engineering; protein evolution; protein expression; protein families; protein folding; protein function; protein interactions; protein interface; protein kinases; protein molecules; protein production; protein sequences; protein stability; protein structure; protein surface; protein tyrosine; provide; range; ray; reaction; receptor; recognition; recombinant; region; regulation; research; residues; response; results; rna; role; self; sequence; set; shows; signal; signaling; simulations; site; size; solution; species; specific; specificity; spectroscopy; stability; state; step; structural; studies; study; substrate; subunit; surface; synthetic; system; target; target protein; tau protein; techniques; temperature; terminal; time; transcription; transfer; transition; transmembrane; type protein; understanding; unfolding; university; use; variants; virus; vitro; wild; work; yeast cache: cord-001835-0s7ok4uw.txt plain text: cord-001835-0s7ok4uw.txt item: #8 of 119 id: cord-001974-wjf3c7a7 author: Friis-Nielsen, Jens title: Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers date: 2016-02-19 words: 5776 flesch: 43 summary: Sequence clusters that have been described in detail throughout the manuscript have been included as supplementary files. A grouping based on taxonomy, or a more data-driven approach that cluster sequence groups based on the associated datasets as seen in Figure 2 , could be included as another iteration to properly strengthen the statistical associations. keywords: associations; cancer; clustering; clusters; contigs; data; features; human; parameters; samples; sequences; sequencing; species; table; virus cache: cord-001974-wjf3c7a7.txt plain text: cord-001974-wjf3c7a7.txt item: #9 of 119 id: cord-002473-2kpxhzbe author: Das, Jayanta Kumar title: Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach date: 2017-03-31 words: 4616 flesch: 56 summary: Current protocols in molecular biology Phylogenetic analysis of protein sequences based on conditional LZ complexity Analyzing and synthesizing phylogenies using tree alignment graphs A probabilistic measure for alignment-free sequence comparison Simplification of protein sequence and alignment-free sequence analysis Phylogenies and the comparative method Progressive sequence alignment as a prerequisitetto correct phylogenetic trees Graph theory with applications to engineering and computer science Protein flexibility predictions using graph theory Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features Use of information discrepancy measure to compare protein secondary structures 2-D graphical representation of protein sequences and its application to coronavirus phylogeny A 2D graphical representation of protein sequence and its numerical characterization Similarity/dissimilarity analysis of protein sequences based on a new spectrum-like graphical representation Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. keywords: acids; amino; chemical; graph; group; ppca; ppcd; protein; sequence cache: cord-002473-2kpxhzbe.txt plain text: cord-002473-2kpxhzbe.txt item: #10 of 119 id: cord-003316-r5te5xob author: Balloux, Francois title: From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic date: 2018-12-17 words: 7342 flesch: 29 summary: However, a micro-costing analysis covering laboratory and personnel costs estimated the cost of clinical WGS to £481 per M. tuberculosis isolate versus £518 applying standard methods, representing relatively marginal cost savings but with significant time savings [63] . Somewhat ironically, the extremely rich information of WGS data, with every genome being unique, generates problems of its own. keywords: amr; analysis; costs; data; diagnostics; example; genome; microbiology; outbreak; resistance; sequence; sequencing; time; transmission; virulence; wgs cache: cord-003316-r5te5xob.txt plain text: cord-003316-r5te5xob.txt item: #11 of 119 id: cord-004862-yv76yvy5 author: Demers, G. William title: The L1 family of long interspersed repetitive DNA in rabbits: Sequence, copy number, conserved open reading frames, and similarity to keratin date: 1989 words: 6680 flesch: 55 summary: In this paper, the rabbit L1 repeats are characterized more thoroughly, and the similarities and differences of L1 sequences between species are explored further. Therefore, the overlap between reading frames 1 and 2 are conserved in mouse Lls, but the overlaps are not seen in the rabbit and human L1 sequences. keywords: dna; end; et al; fig; orf-1; rabbit; region; repeats; sequence cache: cord-004862-yv76yvy5.txt plain text: cord-004862-yv76yvy5.txt item: #12 of 119 id: cord-004879-pgyzluwp author: None title: Programmed cell death date: 1994 words: 81833 flesch: 47 summary: 8cl-2(z is a mitochondrial or perinuclear-associated oncoprotein that prolongs the life span of a variety of cell types by interfering with programmed cell death. Single and repetitive uptake and release of CPZ were measured in each cell type after individual exposure or exposure in any combination of cell types: In 2 hour competitive uptake studies fibreblasts reached 1.7 and 2.6 times the concentrations of C6-and ROC-cells, :respectively. keywords: acid; activation; activity; addition; adult; amino; analysis; animals; antibodies; binding; brain; calcium; cdna; cell lines; cells; changes; cloned; complex; concentrations; conditions; contrast; control; cultures; current; data; days; decrease; development; different; differentiation; dna; domain; early; effects; end; enzyme; epithelial; experiments; expression; extracts; factor; family; fold; form; formation; function; fusion; gene; gene expression; growth; homology; hormone; human; increase; induction; infected; inhibition; institut; interaction; intracellular; kda; kinase; levels; major; mammalian; mechanisms; medium; membrane; mice; molecular; mouse; mrna; muscle; mutant; nerve; neuronal; neurons; new; non; nuclear; nucleus; number; order; pathway; phosphorylation; play; positive; potential; presence; present; process; production; promoter; properties; protein; protein expression; rat; rate; rats; reaction; receptor; recombinant; recombination; region; regulation; release; replication; response; results; rna; role; sequence; signal; sites; species; specific; stage; stimulation; structure; studies; study; subunit; surface; synthesis; system; t cells; target; terminal; time; tissue; tnf; transcription; treatment; tumor; type; university; virus; vitro; vivo; yeast cache: cord-004879-pgyzluwp.txt plain text: cord-004879-pgyzluwp.txt item: #13 of 119 id: cord-005060-n901y2d4 author: ZHANG, Feiyun title: Complete Nucleotide Sequence of Ryegrass Mottle Virus : A New Species of the Genus Sobemovirus date: 2001 words: 2606 flesch: 53 summary: Sobemovirus genome appears to encode a serine protease related to cysteine proteases of picornaviruses Genus sobemovirus Signals for ribosomal frameshifting in the rous sarcoma virus gag-pol region Characterization of ribosomal frameshift in HIV-1 gag-pol expression The putative replicase of the cocksfoot mottle sobemovirus is translated as a part of the polyprotein by -1 ribosomal frameshift Sequence and organization of barley yellow dwarf virus genomic RNA Luteovirus gene expression genome characterization of rice yellow mottle virus RNA Nucleotide sequence of the bean strain of southern bean mosaic virus Identification of four conserved motifs among the RNA-dependent polymerases encoding elements Messenger RNA for the coat protein of southern bean mosaic virus Nucleotide sequence of RNA from the sobemovirus found in infected cocksfoot shows a luteovirus-like arrangement of the putative replicase and protease genes Translation of southern bean mosaic virus RNA in wheat embryo and rabbit reticulocyte extracts Complementarity between the 5'-and 3'-terminal sequences of rice stripe virus RNAs Identification of genes encoding for the cocksfoot mottle virus proteins Cocksfoot mottle virus in Japan Ryegrass mottle virus, a new virus from Lolium multiflorum in Japan Nucleotide sequence of RNA 1, the largest genomic segment of rice stripe virus, the prototype of the tenuivirus The genome-linked protein (VPg) of southern bean mosaic virus is encoded by the ORF2 Guidelines to the demarcation of virus species Sequence and organization of southern bean mosaic virus genomic RNA Evolution of RNA viruses Analysis of the in vitro translation products of RGMoV RNA suggests that the 68 kDa protein may represent a fusion protein of ORF 2-ORF 3 produced by frameshifting. keywords: amino; kda; orf; protein; rgmov; rna; sequence; virus cache: cord-005060-n901y2d4.txt plain text: cord-005060-n901y2d4.txt item: #14 of 119 id: cord-010161-bcuec2fz author: Matson, David O. title: IV, 6. Calicivirus RNA recombination date: 2004-09-14 words: 3338 flesch: 37 summary: It is clear that such clades are related to differences in capsid gene sequences; sequence differences are less marked in the RNA polymerase gene: when RNA polymerase region sequences are analyzed in phylogenetic analyses, statistically significant differences similar to those observed among capsid gene sequences do not occur . Models for RNA virus recombination have utilized two terminologies to describe the degree that features of the donor and acceptor templates are shared: homologous, aberrant homologous, and non-homologous types (Lai and Cavanagh, 1997) or sequence similarity-essential, similarity-assisted, and similarity-nonessential (Nagy and Simon, 1997) . keywords: capsid; cvs; recombination; rna; sequence; strains cache: cord-010161-bcuec2fz.txt plain text: cord-010161-bcuec2fz.txt item: #15 of 119 id: cord-010260-8lnpujip author: Anthonsen, Henrik W. title: The blind watchmaker and rational protein engineering date: 1994-08-31 words: 17358 flesch: 42 summary: A practical approach The modelling of electrostatic interactions in the function of globular proteins Electrostatic interactions in globular proteins: Calculation of the pH dependence of the redox potential of cytochrome C55 I Extracting information on folding from the amino acid sequence: Consensus regions with preferred conformation in homologous proteins Prediction of protein secondary structure at better than 70% accuracy Secondary structure prediction of all-helical proteins in two states PHD -An automatic mail server for protein secondary structure prediction Progress in protein structure prediction? Predicting protein secondary structure with a nearest-neighbor algorithm Database of homologyderived protein structures and the structural meaning of sequence alignment An winexpensive, versatile sample illuminator for photo-CIDNP on any NMR spectrometer Pancreatic lipases: Evolutionary intermediates in a positional change of catalytic carboxylates? key: cord-010260-8lnpujip authors: Anthonsen, Henrik W.; Baptista, António; Drabløs, Finn; Martel, Paulo; Petersen, Steffen B. title: The blind watchmaker and rational protein engineering date: 1994-08-31 journal: J Biotechnol DOI: 10.1016/0168-1656(94)90152-x sha: doc_id: 10260 cord_uid: 8lnpujip In the present review some scientific areas of key importance for protein engineering are discussed, such as problems involved in deducting protein sequence from DNA sequence (due to posttranscriptional editing, splicing and posttranslational modifications), modelling of protein structures by homology, NMR of large proteins (including probing the molecular surface with relaxation agents), simulation of protein structures by molecular dynamics and simulation of electrostatic effects in proteins (including pH-dependent effects). keywords: acid; alignment; amino; approach; cases; charge; data; engineering; et al; fig; gene; information; interactions; methods; modelling; nmr; number; potential; prediction; protein; protein engineering; protein sequence; protein structure; relaxation; residues; resonance; sequence; site; solution; solvent; structure; use cache: cord-010260-8lnpujip.txt plain text: cord-010260-8lnpujip.txt item: #16 of 119 id: cord-010273-0c56x9f5 author: Simmonds, Peter title: Virology of hepatitis C virus date: 2001-10-10 words: 7904 flesch: 30 summary: Homology of the predominant genotype with the prototype American strain Detection of three types of hepatitis C virus in blood donors: Investigation of type-specific differences in serological reactivity and rate of alanine aminotransferase abnormalities Identification of hepatitis C viruses with a nonconserved sequence of the 5' untranslated region Sequence analysis of the 5' noncoding region of hepatitis C virus At least five related, but distinct, hepatitis C viral genotypes exist Typing of hepatitis C virus isolates and new subtypes using a line probe assay Sequence analysis of the 5' untranslated region in isolates of at least four genotypes of hepatitis C virus in the Netherlands Use of the 5' non-coding region for genotyping hepatitis C virus Genotypes of hepatitis C virus in Italian patients with chronic hepatitis C Heterogeneity of hepatitis C virus genotypes in France Genotypic analysis of hepatitis C virus in American patients Hepatitis C virus infection in Egyptian volunteer blood donors in Riyadh Risk factors associated with a high seroprevalence of hepatitis C virus infection in Egyptian blood donors High HCV prevalence in Egyptian blood donors Sequence variability in the 5' non coding region of hepatitis C virus: Identification of a new virus type and restrictions on sequence diversity Geographical distribution of hepatitis C virus genotypes in blood donors: An international collaborative survey New genotype of hepatitis C virus in South-Africa Typing of hepatitis C virus (HCV) genomes by restriction fragment length polymorphisms Distribution of plural HCV types in Japan Clinical backgrounds of the patients having different types of hepatitis C virus genomes Genomic typing of hepatitis C viruses present in China HCV genotypes in China HCV genotypes in different countries Differences in the hepatitis C virus genotypes in different countries Prevalence, genotypes, and an isolate (HC-C2) of hepatitis C virus in Chinese patients with liver disease Imported hepatitis C virus genotypes in Japanese hemophiliacs Genotypic subtyping of hepatitis C virus Survey of major genotypes and subtypes of hepatitis C virus using restriction fragment length polymorphism of sequences amplified from the 5' non-coding region A new type of hepatitis C virus in patients in Thailand Hepatitis C virus variants from Nepal with novel genotypes and their classification into the third major group Hepatitis C virus variants from Vietnam are classifiable into the seventh, eighth, and ninth major genetic groups Prediction of response to interferon treatment of chronic hepatitis C HCV genotypes in chronic hepatitis C and response to interferon Detection of hepatitis C virus by polymerase chain reaction and response to interferon-alpha therapy: hepatitis C virus isolates and PCR primers for specific detection Application of six hepatitis C virus genotyping systems to sera from chronic hepatitis C patients in the United States Use of NS-4 peptides to identify typespecific antibody to hepatitis C virus genotypes 1, 2, 3, 4, 5 and 6 Characterization of hypervariable regions in the putative envelope protein of hepatitis C virus Evidence for immune selection of hepatitis C virus (HCV) putative envelope glycoprotein variants: keywords: c virus; cell; cleavage; genome; genotypes; hcv; hepatitis; infection; patients; proteins; region; replication; rna; sequence; virus cache: cord-010273-0c56x9f5.txt plain text: cord-010273-0c56x9f5.txt item: #17 of 119 id: cord-010499-yefxrj30 author: Yelverton, Elizabeth title: The function of a ribosomal frameshifting signal from human immunodeficiency virus‐1 in Escherichia coli date: 2006-10-27 words: 5905 flesch: 52 summary: Protein sequence analysis demonstrated the occurrence of two closeiy related frameshift mechanisms. Protein sequence analysis of the product indicates the occurrence of two siightiy different mechanisms of shifting. keywords: amino; codon; cycle; frameshifting; gallant; leucine; limitation; protein; reading; sequence; site; trna cache: cord-010499-yefxrj30.txt plain text: cord-010499-yefxrj30.txt item: #18 of 119 id: cord-011565-8ncgldaq author: Elworth, R A Leo title: To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics date: 2020-06-04 words: 12966 flesch: 47 summary: algorithm Sliding hyperloglog: estimating cardinality in a data stream over a sliding window Using cascading Bloom filters to improve the memory usage for de Brujin graphs Fast lossless compression via cascading Bloom filters Improving Bloom filter performance on sequence data using k-mer Bloom filters An improved construction for counting Bloom filters Spectral Bloom filters Diversified RACE sampling on data streams applied to metagenomic sequence analysis Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in sub-linear time Sub-linear sequence search via a Repeated And Merged Bloom Filter (RAMBO): indexing 170 TB data in 14 hours Efficient generation of transcriptomic profiles by random composite measurements The restricted isometry property and its implications for compressed sensing A simple proof of the restricted isometry property for random matrices Adaptive compressed sensing MRI with unsupervised learning Insense: incoherent sensor selection for sparse signals A data-driven and distributed approach to sparse signal representation and recovery The sparse recovery autoencoder Learned D-AMP: principled neural network based compressive image recovery DeepCodec: adaptive sensing and recovery via deep convolutional neural networks Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection Clinical metagenomics Generating WGS trees with Mashtree Variant tolerant read mapping using min-hashing Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis BinDash, software for fast genome distance estimation on a typical personal laptop Dashing: fast and accurate genomic distances with HyperLogLog Finch: a tool adding dynamic abundance filtering to genomic MinHashing Streaming histogram sketching for rapid microbiome analytics Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift kWIP: the k-mer weighted inner product, a de novo estimator of genetic similarity The khmer software package: enabling efficient nucleotide sequence analysis Locality-sensitive hashing for the edit distance Fast search of thousands of short-read sequencing experiments Improved search of large transcriptomic sequencing databases using split sequence bloom trees Ultrafast search of all deposited bacterial and viral genomic data Mash Screen: high-throughput sequence containment estimation for genome discovery Kraken: ultrafast metagenomic sequence classification using exact alignments Fast and sensitive protein alignment using DIAMOND KrakenUniq: confident and fast metagenomics classification using unique k-mer counts Improved metagenomic analysis with Kraken 2 Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters Efficient computation of spaced seeds Ganon: precise metagenomics classification against large and up-to-date sets of reference sequences DREAM-Yara: an exact read mapper for very large databases with short update time Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps A fast approximate algorithm for mapping long reads to large reference databases A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures Metagenomic binning through low-density hashing The ecologist's field guide to sequence-based identification of biodiversity A reference-free algorithm for computational normalization of shotgun sequencing data An improved filtering algorithm for big read datasets and its application to single-cell assembly WGSQuikr: fast whole-genome shotgun metagenomic classification Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation MISSION: While ntHash can be faster than xxHash, CityHash and MurmurHash, it is only appropriate for sequence data. keywords: algorithms; bloom; bloom filter; data; datasets; filter; functions; hash; hashing; memory; mers; minhash; number; query; reads; sequence; sequencing; set; similarity; sketch cache: cord-011565-8ncgldaq.txt plain text: cord-011565-8ncgldaq.txt item: #19 of 119 id: cord-012975-u87ol3fs author: Ogiwara, Atsushi title: Construction of a dictionary of sequence motifs that characterize groups of related proteins date: 1992-09-17 words: 3119 flesch: 52 summary: Sequence motifs with multiple blocks or sequence motifs with single blocks without substitution patterns could be used safely for superfamily assignment. key: cord-012975-u87ol3fs authors: Ogiwara, Atsushi; Uchiyama, Ikuo; Seto, Yasuhiko; Kanehisa, Minoru title: Construction of a dictionary of sequence motifs that characterize groups of related proteins date: 1992-09-17 journal: keywords: database; motif; patterns; sequence; superfamilies; superfamily cache: cord-012975-u87ol3fs.txt plain text: cord-012975-u87ol3fs.txt item: #20 of 119 id: cord-014461-2ubh9u8r author: Nelson, Oranmiyan W. title: Genome sequences published outside of Standards in Genomic Sciences, July - October 2012 date: 2012-10-10 words: 4132 flesch: 32 summary: carotovorum Bacteriophage PP1 Complete Genome Sequences of Two Persicivirga Bacteriophages, P12024S and P12024L Genome sequence of the phage clP1, which infects the beer spoilage bacterium Pediococcus damnosus Complete Genome Sequence of Pseudomonas aeruginosa Siphophage MP1412 Complete Genome Sequences of Two Pseudomonas aeruginosa Temperate Phages, MP29 and MP42, Which Lack the Phage-Host CRISPR Interaction Genome Sequence of the Broad-Host-Range Pseudomonas Phage Φ-S1 Complete Genome Sequence of Pseudomonas aeruginosa Siphophage MP1412 Complete Genome Sequence of Staphylococcus aureus Bacteriophage GH15 Complete Genome Sequence of Vibrio vulnificus Bacteriophage SSP002 Whole genome sequence analyses of three African bovine rotaviruses reveal that they emerged through multiple reassortment events between rotaviruses from different mammalian species Complete Genome Sequence of an Avian Leukosis Virus Isolate Associated with Hemangioma and Myeloid Leukosis in Egg-Type and Meat-Type Chickens Genome Sequence of a Novel Reassortant H3N2 Avian Influenza Virus in Southern China Complete Genome Sequence of an H5N2 Avian Influenza Virus Isolated from a Parrot in Southern China Complete Genome Sequence of an Avian-Like H4N8 Swine Influenza Virus Discovered in Southern China Complete Genome Sequence of a Novel Avian Paramyxovirus Complete Genome Sequence of Avian Tembusu-Related Virus Strain WR Isolated from White Kaiya Ducks in Fujian Complete Genome Sequence of Bluetongue Virus Serotype 9: Implications for Serotyping Complete Genome Sequence of Bluetongue Virus Serotype 16 of Goat Origin from India Genome Sequence of a Bombyx mori Nucleopolyhedrovirus Strain with Cubic Occlusion Bodies Complete Genome Sequence of a Bovine Viral Diarrhea Virus 2 from Commercial Fetal Bovine Serum Complete Genome Sequences of Two Novel European Clade Bovine Foamy Viruses from Germany and Poland Complete Genome Sequences of Novel Canine Noroviruses in Hong Kong Complete Genome Sequence Analysis of a Recent Chicken Anemia Virus Isolate and Comparison with a Chicken Anemia Virus Isolate from Human Fecal Samples in China Complete Genome Sequence of a Chikungunya Virus Isolated in Guangdong Complete Genome Sequences of Two Chinese Virulent Avian Coronavirus Infectious Bronchitis Virus Variants Complete Genome Sequence of a Recombinant Coxsackievirus B4 from a Patient with a Fatal Case of Hand, Foot, and Mouth Disease in Guangxi Complete Genome Sequence of a Novel Human Enterovirus C (HEV-C117) Identified in a Child with Community-Acquired Pneumonia Complete Genome Sequence of the Genotype 4 Hepatitis E Virus Strain Prevalent in Swine in Jiangsu Province, China, Reveals a Close Relationship with That from the Human Population in This Area Complete Genome Sequence of an H10N8 Avian Influenza Virus Isolated from a Live Bird Market in Southern China Complete Genome Sequence of a Novel H9N2 Subtype Influenza Virus FJG9 Strain in China Reveals a Natural Reassortant Event Characterization and Complete Genome Sequence of Human Coronavirus NL63 Isolated in China Whole genome sequence analyses of three African bovine rotaviruses reveal that they emerged through multiple reassortment events between rotaviruses from different mammalian species Complete Genome Sequence of Ikoma Lyssavirus Analysis of the complete genome sequence of two Korean sacbrood viruses in the Honey bee, Apis mellifera The complete mitochondrial genome sequence of the western flower thrips Frankliniella occidentalis (Thysanoptera: Thripidae) contains triplicate putative control regions Genome Sequence of Methylobacterium sp. Complete Genome Sequence of a Street Rabies Virus from Mexico Genome sequence of a waterfowl aviadenovirus, goose adenovirus 4 Jenny) keywords: accession; avian; bacillus; bacteriophage; bacterium; china; complete; draft; genome; mycobacterium; plasmid; porcine; pseudomonas; sequence; sequence accession; staphylococcus; strain; streptococcus; subsp; virus cache: cord-014461-2ubh9u8r.txt plain text: cord-014461-2ubh9u8r.txt item: #21 of 119 id: cord-014462-11ggaqf1 author: None title: Abstracts of the Papers Presented in the XIX National Conference of Indian Virological Society, “Recent Trends in Viral Disease Problems and Management”, on 18–20 March, 2010, at S.V. University, Tirupati, Andhra Pradesh date: 2011-04-21 words: 35463 flesch: 47 summary: The following virus isolates have been used in the analysis: GTPV-Uttarkashi, P60, vaccine virus; GTPV Mukteswar, P10, Challenge virus; GTPV (Akola), GTPV Bareilly/00, GTPV Ladakh/01 and GTPV Sambalpur/82, field isolates and SPPV Srinagar, P40; SPPV Ranipet, P50; SPPV-RF, P50, vaccine viruses and SPPV Makdhoom/07, SPPV CIRG/08, SPPV Pune/08, SPPV Bareilly, SPPV 183/03 and SPPV 125/02, field isolates. Present paper discusses about virus disease of quarantine importance affecting ornamental and fruit plants such as Chrysanthimum, Dahlia, Dianthus, Rosabengalensis, Cattleya, Cymbidium, Dendrobium, Lilium, Citrus, Vitis etc. keywords: acid; analysis; animals; antibodies; antigen; assay; cases; cells; cloned; control; crop; curl; dengue; detection; development; disease; dna; elisa; expression; field; food; gene; host; india; infection; isolates; leaf; management; methods; molecular; mosaic; mosaic virus; nucleotide; pathogens; patients; pcr; plant; positive; present; primers; production; protein; region; resistance; response; results; rna; samples; sequence; specific; study; symptoms; time; tomato; total; vaccine; vector; viral; virus; virus infection; viruses; world; yellow cache: cord-014462-11ggaqf1.txt plain text: cord-014462-11ggaqf1.txt item: #22 of 119 id: cord-014674-ey29970v author: None title: Dreizehnter Bericht nach Inkrafttreten des Gentechnikgesetzes (GenTG) für den Zeitraum vom 1.1.2002 bis 31.12.2002 : Die Arbeit der Zentralen Kommission für die Biologische Sicherheit (ZKBS) im Jahr 2002 date: 2003 words: 2525 flesch: 47 summary: and therefore is not expected to allow specific amplification of p-35S sequences.] In the sequences of the amplification products AF434754, -55, -56, -57 in which the iPCR primer sequences can be identified the nucleotide sequences ahead of the primers are not from p-35S.The expected p-35S sequence is only partially present ahead of iCVM1 in AF434758. keywords: der; des; die; dna; für; gentechnik; ipcr; maize; p-35s; sequences; und cache: cord-014674-ey29970v.txt plain text: cord-014674-ey29970v.txt item: #23 of 119 id: cord-015850-ef6svn8f author: Saitou, Naruya title: Eukaryote Genomes date: 2013-08-22 words: 7442 flesch: 48 summary: The complete nucleotide sequence of the tobacco mitochondrial genome: Comparative analysis of mitochondrial genomes in higher plants and multipartite organization Widespread horizontal transfer of mitochondrial genes in fl owering plants Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin Small, repetitive DNAs contribute signifi cantly to the expanded mitochondrial genome of cucumber The complete nucleotide sequence of the tobacco chloroplast genome: Its gene organization and expression Changes in the structure of DNA molecules and the amount of DNA per plastid during chloroplast development in maize Pattern of organization of human mitochondrial pseudogenes in the nuclear genome Why genes in pieces? Introns. As for plants, Kaplinsky [ 62 ] ) compared genome sequences of Arabidopsis, grape rice, and Brachypodium and found >100 times more abundant CNSs from monocots than dicots. keywords: dna; duplication; eukaryotes; evolution; genes; genome; human; introns; junk; number; plants; protein; rna; sequence; size; species; type cache: cord-015850-ef6svn8f.txt plain text: cord-015850-ef6svn8f.txt item: #24 of 119 id: cord-016293-pyb00pt5 author: Newell-McGloughlin, Martina title: The flowering of the age of Biotechnology 1990–2000 date: 2006 words: 22413 flesch: 45 summary: These DNA chips have broad commercial applications and are now used in many areas of basic and clinical research including the detection of drug resistance mutations in infectious organisms, direct DNA sequence comparison of large segments of the human genome, the monitoring of multiple human genes for disease associated mutations, the quantitative and parallel measurement of mRNA expression for thousands of human genes, and the physical and genetic mapping of genomes. Of course for such a radical approach certain basal level criteria needed to be established for selecting disease candidates for human gene therapy. keywords: animal; biology; biotechnology; cancer; cells; company; data; development; disease; dna; drug; expression; food; gene; gene therapy; genome; human; influenza; information; level; molecular; nih; number; plant; production; products; project; protein; research; rna; scientists; sequence; sequencing; stem cells; studies; system; techniques; technology; therapy; time; transfer; transgenic; university; use; virus; year cache: cord-016293-pyb00pt5.txt plain text: cord-016293-pyb00pt5.txt item: #25 of 119 id: cord-016594-lj0us1dq author: Flower, Darren R. title: Identification of Candidate Vaccine Antigens In Silico date: 2012-09-28 words: 12575 flesch: 33 summary: A long, naturally presented immunodominant epitope from NY-ESO-1 tumor antigen: implications for cancer vaccine design Identification and characterization of pathogenicity and other genomic islands using base composition analyses A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands CpGcluster: a distance-based algorithm for CpG-island detection CpGIF: an algorithm for the identification of CpG islands Identifying CpG islands by different computational techniques CpG_MI: a novel approach for identifying functional CpG islands in mammalian genomes Evaluation of genomic island predictors using a comparative genomics approach IslandPath: aiding detection of genomic islands in prokaryotes Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models A computational approach for identifying pathogenicity islands in prokaryotic genomes Resolving the structural features of genomic islands: a machine learning approach Detection of genomic islands via segmental genome heterogeneity Prediction of pathogenicity islands in enterohemorrhagic Escherichia coli O157:H7 using genomic barcodes IslandViewer: an integrated interface for computational identification and visualization of genomic islands Towards pathogenomics: a web-based resource for pathogenicity islands Identification and characterization of a novel family of pneumococcal proteins that are protective against sepsis Functional genomics of pathogenic bacteria SYFPEITHI: database for searching and Tcell epitope prediction SYFPEITHI: database for MHC ligands and peptide motifs HIV sequence databases MHCBN 4.0: a database of MHC/TAP binding peptides and T-cell epitopes MHCBN: a comprehensive database of MHC binding and non-binding peptides EPIMHC: a curated database of MHCbinding peptides for customized computational vaccinology AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data JenPep: a novel computational information resource for immunobiology and vaccinology JenPep: a database of quantitative functional peptide data for immunology The immune epitope database 2.0 AntigenDB: an immunoinformatics database of pathogen antigens VIOLIN: vaccine investigation and online information network Epitopic peptides with low similarity to the host proteome: towards biological therapies without side effects Peptimmunology: immunogenic peptides and sequence redundancy Primer: mechanisms of immunologic tolerance Recent advances in immune modulation Cutting edge: contributions of apoptosis and anergy to systemic T cell tolerance Discriminating antigen and non-antigen using proteome dissimilarity III: tumour and parasite antigens Discriminating antigen and non-antigen using proteome dissimilarity II: viral and fungal antigens Discriminating antigen and non-antigen using proteome dissimilarity: bacterial antigens Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Single proteins might have dual but related functions in intracellular and extracellular microenvironments Locating proteins in the cell using TargetP, SignalP and related tools Improved prediction of signal peptides: SignalP 3.0 A comprehensive assessment of N-terminal signal peptides prediction methods WoLF PSORT: protein localization predictor Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains SubLoc: a server/client suite for protein subcellular location based on SOAP Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins Advantages of combined transmembrane topology and signal peptide prediction-the Phobius web server Prediction of lipoprotein signal peptides in Gram-negative bacteria Prediction of twin-arginine signal peptides Validating subcellular localization prediction tools with mycobacterial proteins Toward bacterial protein sub-cellular location prediction: single-class discrimminant models for all gram-and gram+ compartments Multi-class subcellular location prediction for bacterial proteins Alpha helical trans-membrane proteins: enhanced prediction using a Bayesian approach Beta barrel trans-membrane proteins: enhanced prediction using a Bayesian approach A predictor of membrane class: discriminating alpha-helical and beta-barrel membrane proteins from non-membranous proteins TATPred: a Bayesian method for the identification of twin arginine translocation pathway signal sequences LIPPRED: a web server for accurate prediction of lipoprotein signal sequences and cleavage sites Combining algorithms to predict bacterial protein sub-cellular location: parallel versus concurrent implementations Predicting the subcellular localization of viral proteins within a mammalian host cell Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells Structure and sequence relationships in the lipocalins and related proteins Structural Relationship of Streptavidin to the Calycin Protein Superfamily Analysis of known bacterial protein vaccine antigens reveals biased physical properties and amino acid composition Adaptation of protein surfaces to subcellular location Hierarchical classification of G-protein-coupled receptors with data-driven selection of attributes and classifiers GPCRTree: online hierarchical classification of GPCR function Optimizing amino acid groupings for GPCR classification On the hierarchical classification of G protein-coupled receptors Proteomic applications of automated GPCR classification VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Identifying candidate subunit vaccines using an alignment-independent method based on principal amino acid properties DNA and peptide sequences and chemical processes multivariately modeled by principal component analysis and partial least-squares projections to latent structures Principal property-values for 6 nonnatural amino-acids and their application to a structure activity relationship for oxytocin peptide analogs Peptide binding to the HLA-DRB1 supertype: a proteochemometrics analysis Proteochemometrics mapping of the interaction space for retroviral proteases and their substrates Proteochemometrics analysis of substrate interactions with dengue virus NS3 proteases Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand interactions Improved approach for proteochemometrics modeling: application to organic compound-amine G protein-coupled receptor interactions Melanocortin receptors: ligands and proteochemometrics modeling Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands Peptide quantitative structureactivity-relationships, a multivariate approach Multivariate parametrization of 55 coded and non-coded amino-acids New chemical descriptors relevant for the design of biologically active peptides. Vaccines based on APCs and peptides are new but unproven strategies; most modern vaccine development relies instead on effective searches for vaccine antigens. keywords: analysis; antigens; approach; binding; candidate; cell; data; database; discovery; epitope; genome; genomic; host; immunogenicity; islands; methods; mhc; peptide; prediction; protein; sequence; system; vaccines; vaccinology cache: cord-016594-lj0us1dq.txt plain text: cord-016594-lj0us1dq.txt item: #26 of 119 id: cord-016798-tv2ntug6 author: Gautam, Ablesh title: Bioinformatics Applications in Advancing Animal Virus Research date: 2019-06-06 words: 6983 flesch: 35 summary: VIDA retrieves virus sequences from GenBank and the files are parsed into subfields. VIDA also provides functional classification of virus proteins into broad functional classes based on typical virus processes such as DNA and RNA replication, virus structural proteins, nucleotide and nucleic acid metabolism, transcription, glycoproteins and others. keywords: analysis; annotation; bioinformatics; database; et al; gene; genome; host; influenza; information; prediction; proteins; sequence; tools; virus; viruses; web cache: cord-016798-tv2ntug6.txt plain text: cord-016798-tv2ntug6.txt item: #27 of 119 id: cord-017354-cndb031c author: Janies, D. title: Large-Scale Phylogenetic Analysis of Emerging Infectious Diseases date: 2008 words: 12430 flesch: 42 summary: Here we review the computational challenges of comparative genomic analyses, specifically sequence alignment and reconstruction of phylogenetic trees. Phylogenetic trees are represented by acyclic graphs in which the leaves of these graphs represent the observed biological entities (taxa) being compared (e.g., sequences of genes, genomes, and/or anatomy of individuals, isolates or cultivars, species, or any higher level taxonomic unit). keywords: alignment; analysis; avian; character; data; host; human; influenza; isolates; length; methods; number; organisms; outgroup; search; sequence; strains; taxa; tree; viruses cache: cord-017354-cndb031c.txt plain text: cord-017354-cndb031c.txt item: #28 of 119 id: cord-017584-9rx4jlw8 author: Kim, Kwangsoo title: Selecting Genotyping Oligo Probes Via Logical Analysis of Data date: 2007 words: 3665 flesch: 50 summary: In brief, the probe design methods of [2] and [27] required several CPU hours of computation and selected probes that obtained 85.6% and 81.1% correct classification rates, respectively. We used the three influenza virus N subtypes with 30 or more samples in Table 1 and selected monospecific probes for their classification. keywords: classification; data; probes; sequences; target cache: cord-017584-9rx4jlw8.txt plain text: cord-017584-9rx4jlw8.txt item: #29 of 119 id: cord-017932-vmtjc8ct author: Georgiev, Vassil St. title: Genomic and Postgenomic Research date: 2009 words: 8483 flesch: 30 summary: Next, these gene predictions can be further refined by searching for nearby regulatory sites such as the ribosome-binding sites, as well as by aligning protein sequences to other species. Large-scale prepublication information on genome sequences is a unique research resource for the scientific community, and rapid and unrestricted sharing of microbial genome sequence data is essential for advancing research on infectious agents responsible for human disease. keywords: analysis; centers; coli; data; diseases; genes; genome; genomic; host; human; influenza; microbial; niaid; proteins; research; sequence; sequencing cache: cord-017932-vmtjc8ct.txt plain text: cord-017932-vmtjc8ct.txt item: #30 of 119 id: cord-018133-2otxft31 author: Altman, Russ B. title: Bioinformatics date: 2006 words: 9594 flesch: 44 summary: Computer systems within bioinformatics thus must be able to handle biological sequence information effectively and efficiently. Nonetheless, the effects of sequence information on clinical databases will be significant. keywords: analysis; bioinformatics; data; databases; dna; function; genes; genome; human; information; knowledge; molecules; protein; sequence; structure cache: cord-018133-2otxft31.txt plain text: cord-018133-2otxft31.txt item: #31 of 119 id: cord-018459-isbc1r2o author: Munjal, Geetika title: Phylogenetics Algorithms and Applications date: 2018-12-10 words: 1853 flesch: 36 summary: The limitations associated with sequence alignment methods lead to the development of alignment-free sequence analysis. Multiple sequence alignment methods emphasize that more closely related sequences should be aligned first. keywords: alignment; methods; sequences; species; tree cache: cord-018459-isbc1r2o.txt plain text: cord-018459-isbc1r2o.txt item: #32 of 119 id: cord-018963-2lia97db author: Xu, Ying title: Protein Structure Prediction by Protein Threading date: 2010-04-29 words: 15314 flesch: 39 summary: The protein threading problem with sequence amino acid interaction preferences is NP-complete Introduction to ProteinArchitecture: The StructuralBiology ofProteins A unified statistical framework for sequence comparison and structure comparison Emergence of preferred structures in a simple model of protein folding Are protein folds atypical? Designability of protein structures: A lattice-model study using the Miyazawa-Jernigan matrix A distance-dependent atomic knowledge-based potential for improved protein structure selection Geometric cooperativity and anti-cooperativity of threebody interactions in native proteins Multimeric threading-based prediction of protein-protein interactions on a genomic scale: Application to the Saccharomyces cerevisiae proteome Protein distance constraints predicted by neural networks and probability density functions Peons: A neuralnetwork-based consensus predictor that improves fold recognition Threading analysis suggests that the obese gene product may be a helical cytokine Comparative genomics ofthe Archaea (Euryarchaeota): Evolution of conserved protein families, the stable core, and the variable shell How many species are there on earth Improvement ofthe GenTHREADER method for genomic fold recognition Protein Structure Prediction by Protein Threading The Genomic Threading Database: A comprehensive resource for structural annotations of the genomes from key organisms Novel knowledge-based mean force potential at atomic level Statistical significance of protein structure prediction by threading Statistical significance of hierarchical multibody potentials based on Delaunay tessellation and their application in sequence-structure alignment SCOP: A structural classification of proteins database for the investigation of sequences and structures Protein superfamilies and domain superfolds CATH-A hierarchic classification of protein domain structures A local alignment method for protein structure motifs Threading with explicit models for evolutionary conservation ofstructure and sequence Combination ofthreading potentials and sequence profiles improves fold recognition Combinatorial Optimization: Algorithms and Complexity New techniques in structural NMR-anisotropic interactions Protein fold recognition through application of residual dipolar coupling data Protein structure prediction using sparse dipolar coupling data The anatomy and taxonomy ofprotein structure Graph minors .2. To keep up with the rate at which protein structures are being solved, there is a clear need for more automated domain-partitioning methods to process the newly solved structures. keywords: algorithm; alignment; amino; decomposition; energy; et al; families; fold; function; graph; number; prediction; problem; protein; protein structure; protein threading; query; sequence; structure; template; threading; tree cache: cord-018963-2lia97db.txt plain text: cord-018963-2lia97db.txt item: #33 of 119 id: cord-022348-w7z97wir author: Sola, Monica title: Drift and Conservatism in RNA Virus Evolution: Are They Adapting or Merely Changing? date: 2007-09-02 words: 10898 flesch: 50 summary: Muller's ratchet decreases fitness of a DNA-based microbe Increased immune response elicited by DNA vaccination with a synthetic gp120 sequence with optimized codon usage The phylogeny of The Canterbury Tales Isolation of new ribozymes from a large pool of random sequences Forced evolution of a regulatory RNA helix in the HIV-1 genome Role of the first and third extracellular domains of CXCR-4 in human immunodeficiency virus coreceptor activity Molecular Mechanisms of Immune Responses in Insects Nucleotide composition as a driving force in the evolution of retroviruses Unusually high frequency of Epstein-Barr virus genetic variants in Papua New Guinea that can escape cytotoxic T-cell recognition: implications for virus evolution Role of host immune response in selection of equine infectous anemia virus variants Fitness of RNA virus decreased by Muller's ratchet Evolution of sex and the molecular clock in RNA viruses HIV and T-cell expansion in splenic white pulps is accompanied by infiltration of HIV-specific cytotoxic T-lymphocytes Antigenic stimulation by BCG as an in vivo driving force for SIV replication and dissemination Genetic bottlenecks and population passages cause profound fitness differences in RNA viruses Nucleotide sequences of three Nodavirus RNA2's: the messengers for their coat protein precursors Primary and secondary structure of black beetle virus RNA2, the genomic messenger for BBV coat protein precursor HLA-A11 epitope loss isolates of Epstein-Barr virus from a highly Al1+ population T cell responses and virus evolution: loss of HLA All-restricted CTL epitopes in Epstein-Barr virus isolates from highly All-positive populations by selective mutation of anchor residues RNA virus quasispecies populations can suppress vastly superior mutant progeny The genome sequence of herpes simplex virus type 2 RNA viral mutations and fitness for survival Basic concepts in RNA virus evolution Origins and evolutionary relationships of retroviruses Rates of spontaneous mutations among RNA viruses Rapid fitness losses in mammalian RNA virus clones due to Muller's ratchet High viral load and CD4 lymphopenia in rhesus and cynomolgus macaques infected by a chimeric primate lentivirus constructed using the env, rev, tat, and vpu genes from HIV-1 Lai The viral quasispecies Sequence space and quasispecies distribution Structurally complex and highly active RNA ligases derived from random RNA sequences Does the VP1 gene of foot-and-mouth disease virus behave as a molecular clock? key: cord-022348-w7z97wir authors: Sola, Monica; Wain-Hobson, Simon title: Drift and Conservatism in RNA Virus Evolution: Are They Adapting or Merely Changing? date: 2007-09-02 journal: Origin and Evolution of Viruses DOI: 10.1016/b978-012220360-2/50007-6 sha: doc_id: 22348 cord_uid: w7z97wir This chapter argues that the vast majority of genetic changes or mutations fixed by RNA viruses are essentially neutral or nearly neutral in character. keywords: acid; amino; et al; evolution; example; figure; fitness; genomes; hiv; human; immunodeficiency; mutations; number; proteins; rna; selection; sequence; substitutions; variation; virus; viruses; vivo cache: cord-022348-w7z97wir.txt plain text: cord-022348-w7z97wir.txt item: #34 of 119 id: cord-022494-d66rz6dc author: Webb, B. title: Comparative Modeling of Drug Target Proteins date: 2014-10-01 words: 8784 flesch: 45 summary: 19, 20 Computational protein structure prediction methods, such as threading 21 and comparative protein structure modeling, 22, 23 strive to bridge the sequence-structure gap by utilizing these evolutionary relationships. 9 Shown are the different ranges of applicability of comparative protein structure modeling, threading, and de novo structure prediction, their corresponding accuracies, and their sample applications. keywords: accuracy; alignment; docking; drug; errors; identity; ligand; methods; modeling; models; protein; sequence; structure; target; template cache: cord-022494-d66rz6dc.txt plain text: cord-022494-d66rz6dc.txt item: #35 of 119 id: cord-023208-w99gc5nx author: None title: Poster Presentation Abstracts date: 2006-09-01 words: 71178 flesch: 41 summary: Peptide structures can be approached by spectroscopy and NMR techniques but data from these approaches too frequently diverge. To increase the stability and the therapeutic efficacy of peptide sequences from myelin oligodendrocyte protein (MOG) that act as multiple sclerosis (MS) antigens, we grafted them onto a framework of a particularly stable class of peptides, the cyclotides. keywords: acid; activation; activity; affinity; aggregation; aim; alpha; amino; amino acid; analogues; analysis; approach; arg; assay; beta; binding; blood; bond; cancer; cell; chain; chemical; chemistry; complex; complexes; compounds; concentration; conformational; conjugates; cyclic; data; derivatives; design; development; disulfide; dna; domain; effect; enzyme; epitope; fmoc; fragment; gly; group; growth; hplc; human; inhibitors; integrin; interaction; ligands; mechanism; membrane; method; mice; model; molecular; molecules; native; new; nmr; non; novel; opioid; order; patients; peptide; peptide analogues; peptide chain; peptide synthesis; phase; phase peptide; phe; position; potential; prepared; presence; products; proline; properties; protein; reaction; receptor; residues; results; role; sequence; site; solution; specific; spectroscopy; stability; strategy; structure; studies; study; surface; synthetic; system; target; terminal; therapeutic; treatment; tumor; turn; type; tyr; use; vivo; water; work cache: cord-023208-w99gc5nx.txt plain text: cord-023208-w99gc5nx.txt item: #36 of 119 id: cord-023209-un2ysc2v author: None title: Poster Presentations date: 2008-10-07 words: 112272 flesch: 42 summary: A specifi c bioassay was developed for screening peptides activity in high salinity conditions in order to evaluate the inhibition of biofi lm growth, based on growing biofi lmforming bacteria in a 96-wells microtiter plate. The insight into the molecular mechanism of peptides activity is obtained in vitro using SAXS method and artifi cial systems mimicking a bacterial cytoplasmic membrane. keywords: acid peptide; acid residues; acids; activation; activities; activity; affi; agents; aggregation; aim; ala; amide; amino; amino acid; amyloid; analogs; analogues; analysis; antimicrobial; application; approach; assay; backbone; binding; biological; blood; bond; brain; c peptide; cancer; cation; cell; chemical; chemistry; cient; city; coil; complex; compounds; concentration; conditions; confi; conformational; conjugates; coupling; cyclic; data; delivery; derivatives; design; development; diseases; dna; domain; drug; ed peptides; effect; effi; experiments; family; fmoc; formation; fragments; free; function; gly; group; helix; hplc; human; identifi; infl; inhibition; inhibitors; interaction; interest; leu; ligation; lipid; mass; mechanism; membrane; method; microwave; model; model peptides; modifi; modifi ed; molecules; native; natural; new; nity; nmr; non; novel; number; order; peptide; peptide analogues; peptide bond; peptide chain; peptide chemistry; peptide fragments; peptide library; peptide ligands; peptide sequence; peptide structure; peptide synthesis; phase peptide; phase synthesis; phe; position; positive; potential; presence; present; process; processes; properties; protein; purifi; range; reaction; receptor; recognition; region; report; residues; results; role; rst; selective; sequence; signifi; site; solution; specifi c; spectroscopy; stability; stable; strategy; structure; studies; study; surface; synthetic; system; target; terminal; terminus; therapy; time; treatment; trp; tumor; type; university; uorescence; uptake; use; vitro; vivo; water; work cache: cord-023209-un2ysc2v.txt plain text: cord-023209-un2ysc2v.txt item: #37 of 119 id: cord-023647-dlqs8ay9 author: None title: Sequences and topology date: 2003-03-21 words: 4522 flesch: 38 summary: A 32-kDa Llpo~ortin from Human Mononuclear Cells Appears to be Identical with the Placental Inhibitor of Blood Coagulation Distinct Fercedoxins from Rhodobacter-Capsulstus -Complete Amino Acid Sequences and Molecular Evolution N~ptide Sequence Analysis and Molecular Cloning Reveal Two Calcium Pump Isoforms in the Human Erythrocyte Membgane Cloning and Characterization of a Novel Member of the Cytochrome-P450 Subfamily IVA in Rat Prostate A Directiy Repeated Sequence in the ~-Globin Promoter Resulates Transcription in Murine Efythroleukemla Cells Isolation and Chamcterizatinn of the Alkane-Inducibie NADPH-Cytochrome-P-450 Olf, idoreductsse Gene from Candida-Tropicalls -Identification of Invarlant Residues Wlthin Slmilmr Amino Acid Sequences of Direr'sent Flavoproteins Protein Klnase-C Inhibitor Proteins -Purification from Sheep Brain and Sequence Similarity to Lipocortins and 14-3-3 MCI~ AVEmL~ B& Sequence Homology Between Purple Acid Phosphatases and Phusphoprotein Pho*phatsses --are Phesphoprotcin Phosphatatms Metalloproteins Collt~|nln~ Oil~-bridged Dinuclcar Metal Centers Negative Regulation of the Human ~-Globin Ca~ne by Transcriptional Interference: Role of an Mu Repetitive ~lement Amino Acid Sequence of Chicken Catisequestrin Deduced from C DNA -Comp~rison of Caisequestrin and Aspartactin Caisequestrin, an Intesccilular Calciumbinding Protein of Skeletal Muscle Sarcoplssmic Reticulm, Is HomoloKous to ~, a Putstive latminin-binding Protein of the Exteac¢llular Matr~ BOvSm~ ]Prote~ C Inhihl.gog with Structugll and Fun~ HotDoIO~OU~ ]~-.gtl~ to Hum~zn The 188 ltilm0omal RNA ~-quence of the S~t Anemone Anemom~s ssdcmta and Its Evolutionary INtuition Amomqg Other Eukaryotes Inferred b'om S~l,.m.~ Comlmrttmas of a Heat Shock G~ae in Two Nematorl~ The l~'/O Multtgene Family of Ok~hag of CDNA ~ for the ~ Omin of Human Complement Component ca~bi~una Protein, seqaenoe Homolo~ with thc a C~t~:~a~h Proc Natl Acad S¢t USA1990 Highly Conserved Core Domain and Unique N Terminus with Presumptive Regulatory Moti~ in a Hmman TATA Factor (l'lql~) keywords: acid; amino; amino acid; analysis; cell; conserved; dna; domain; evolution; factor; family; gene; homology; human; member; new; novel; protein; rna; sequence; similarity; structure; virus; yeast cache: cord-023647-dlqs8ay9.txt plain text: cord-023647-dlqs8ay9.txt item: #38 of 119 id: cord-025610-7vouj8pp author: Latif, Seemab title: Backward-Forward Sequence Generative Network for Multiple Lexical Constraints date: 2020-05-06 words: 3924 flesch: 44 summary: However, generating sequence from pre-specified lexical constraints is a new, challenging and less researched area in NLG. Our proposed approach shows lower perplexity than CGMH sampling method for sentence generation through keywords/constraints 1 to 3, while with 4 constraints as input CGMH shows slightly better result than our approach of generating sequence with verb constraint and during inference replacing the words in sequence with closest embedding similarity. keywords: backward; constraints; forward; language; model; sequence; word cache: cord-025610-7vouj8pp.txt plain text: cord-025610-7vouj8pp.txt item: #39 of 119 id: cord-025948-6dsx7pey author: Maitra, Arindam title: Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility date: 2020-06-04 words: 7221 flesch: 48 summary: Viral RNA sequences obtained from two samples S11 and S12 shared all mutations except a V32L mutation at ORF8 harboured by S11 and not by S12. The sequencing reads obtained in shotgun RNA-Seq experiment were mapped to reference viral sequence, variants detected and consensus sequence for each sample built using Dragen RNA pathogen detection software (version 9) in BaseSpace (Illumina Inc, USA). keywords: binding; chain; clade; cov-2; d614; genome; india; mirnas; mutations; protein; rna; samples; sars; sequences cache: cord-025948-6dsx7pey.txt plain text: cord-025948-6dsx7pey.txt item: #40 of 119 id: cord-027316-echxuw74 author: Modarresi, Kourosh title: Detecting the Most Insightful Parts of Documents Using a Regularized Attention-Based Model date: 2020-05-22 words: 2117 flesch: 35 summary: Deep Learning Summit Standardization of featureless variables for machine learning models using natural language processing Generalized variable conversion using k-means clustering and web scraping An efficient deep learning model for recommender systems Effectiveness of Representation Learning for the Analysis of Human Behavior An evaluation metric for content providing models, recommendation systems, and online campaigns Combined Loss Function for Deep Convolutional Neural Networks A Randomized Algorithm for the Selection of Regularization Parameter. A neural probabilistic language model Theano: a CPU and GPU math expression compiler Audio chord recognition with recurrent neural networks A singular value thresholding algorithm for matrix completion Exact matrix completion via convex optimization Compressive sampling Long short-term memory-networks for machine reading Learning phrase representations using RNN encoder-decoder for statistical machine translation Framewise phoneme classification with bidirectional LSTM and other neural network architectures Generating sequences with recurrent neural networks The Elements of Statistical Learning; Data miNing, Inference and Prediction Handwritten digit recognition via deformable prototypes Gene Shaving' as a method for identifying distinct sets of genes with similar expression patterns Matrix Completion via Iterative Soft-Thresholded SVD Package 'impute'. keywords: embedding; encoder; learning; model; neural; translation cache: cord-027316-echxuw74.txt plain text: cord-027316-echxuw74.txt item: #41 of 119 id: cord-031957-df4luh5v author: dos Santos-Silva, Carlos André title: Plant Antimicrobial Peptides: State of the Art, In Silico Prediction and Perspectives in the Omics Era date: 2020-09-02 words: 16639 flesch: 34 summary: Thus, there is a need for computational framework methods to predict protein structures based on the knowledge of the sequence. In addition, in recent years, there has been impressive progress in the development of algorithms for protein folding that may aid in the prediction of protein structures from amino acid sequence information. keywords: acid; activity; amps; analysis; antifungal; approaches; binding; bonds; cysteine; database; defensins; disulfide; docking; family; figure; function; gene; identification; information; lipid; methods; modeling; models; motif; novel; pathogen; peptides; plant; potential; prediction; present; protein; residues; sequence; structure cache: cord-031957-df4luh5v.txt plain text: cord-031957-df4luh5v.txt item: #42 of 119 id: cord-033010-o5kiadfm author: Durojaye, Olanrewaju Ayodeji title: Potential therapeutic target identification in the novel 2019 coronavirus: insight from homology modeling and blind docking study date: 2020-10-02 words: 8149 flesch: 48 summary: Qualitative Model Energy Analysis (QMEAN) is a composite scoring function that describes protein structures on the basis of major geometrical aspects. A novel coronavirus and SARS Crystal structures of the main peptidase from the SARS coronavirus inhibited by a substrate-like aza-peptide epoxide Dissection study on the SARS 3C-like protease reveals the critical role of the extra domain in dimerization of the enzyme: defining the extra domain as a new target for design of highly-specific protease inhibitors 3C-like proteinase from SARS coronavirus catalyzes substrate hydrolysis by a general base mechanism Only one protomer is active in the dimer of SARS 3C-like proteinase Biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus 3C-like proteinase A trial of lopinavir-ritonavir in adults hospitalized with severe covid-19 EMBOSS: the European molecular biology open software suite SRS, an indexing and retrieval tool for flat file data libraries Issues in bioinformatics benchmarking: the case study of multiple sequence alignment HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 Toward the estimation of the absolute quality of individual protein structure models MolProbity: more and better reference data for improved all-atom structure validation Chapter 2: Protein Composition and Structure Modeling protein quaternary structure of homo-and hetero-oligomers beyond binary interactions by homology UCSF chimera-a visualization system for exploratory research and analysis Fasman GD (1974) Prediction of protein conformation Protein Identification and Analysis Tools on the ExPASy Server The rapid generation of mutation data matrices from protein sequences MEGA7: keywords: 2019; acid; amino; amino acid; binding; coronavirus; docking; model; ncov; protein; proteinase; sars; score; sequence; structure; target protein; template cache: cord-033010-o5kiadfm.txt plain text: cord-033010-o5kiadfm.txt item: #43 of 119 id: cord-035033-osjy88rc author: Aydin, Berkay title: Spatiotemporal event sequence discovery without thresholds date: 2020-11-09 words: 8236 flesch: 50 summary: In this work, we focus on spatiotemporal event sequences (STES) from event datasets that contain instances with region-based geometric representations. Given this information, the task of STES mining, in general, is interested in discovering spatiotemporal event sequences whose instance sequences are frequently repeated. keywords: algorithm; data; datasets; event; event sequences; follow; instances; mining; sequences; stess; threshold; time; values cache: cord-035033-osjy88rc.txt plain text: cord-035033-osjy88rc.txt item: #44 of 119 id: cord-102766-n6mpdhyu author: Alam, Md. Nafis Ul title: Short k-mer Abundance Profiles Yield Robust Machine Learning Features and Accurate Classifiers for RNA Viruses date: 2020-06-25 words: 3202 flesch: 47 summary: It has been 90 demonstrated that RNA-Seq data can be a very promising avenue for improving knowledge on 91 RNA viruses when leveraged by tactful algorithms Viral metagenomics Third generation sequencing: technology and its potential impact on 602 evolutionary biodiversity research Virus taxonomy: the database of the International Committee on 606 Nucleic Acids Research Accelerated Profile HMM Searches BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-610 seq Data Bridger: a new framework for de novo transcriptome assembly using 612 RNA-seq data Shannon: An Information-Optimal de Novo RNA-Seq Assembler rnaSPAdes: a de novo transcriptome assembler and 616 its application to RNA-Seq data IDBA-tran: a more robust de novo de Bruijn graph assembler for 621 transcriptomes with uneven expression levels SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq 623 reads De novo assembly and analysis of RNA-seq data Full-length transcriptome assembly from RNA-Seq data without a 627 reference genome Oases: robust de novo RNA-seq assembly across the dynamic range 629 of expression levels S1 Fig. keywords: data; feature; genomes; learning; machine; models; rna; sequence; viruses cache: cord-102766-n6mpdhyu.txt plain text: cord-102766-n6mpdhyu.txt item: #45 of 119 id: cord-103029-nc5yf6x4 author: Wichmann, Stefan title: Computational design of genes encoding completely overlapping protein domains: Influence of genetic code and taxonomic rank date: 2020-09-25 words: 8666 flesch: 47 summary: Other properties required for functional protein sequences can be inferred from the evolutionary information contained in sequence alignments of protein families. Constructed OLG sequences are also indistinguishable from natural sequences in terms of amino acid identity and secondary structure, while the minimum nucleotide change required for overprinting an overlapping sequence can be as low as 1.8% of the sequence. keywords: acid; amino; code; fig; genes; olg; olgs; overlapping; protein; sequences; structure cache: cord-103029-nc5yf6x4.txt plain text: cord-103029-nc5yf6x4.txt item: #46 of 119 id: cord-103297-4stnx8dw author: Widrich, Michael title: Modern Hopfield Networks and Attention for Immune Repertoire Classification date: 2020-08-17 words: 14116 flesch: 51 summary: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning Explaining and interpreting LSTMs Solving the protein sequence metric problem Rank-loss support instance machines for miml instance annotation Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires Multiple instance learning: a survey of problem characteristics and applications VDJServer: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements Tetramer-visualized gluten-specific CD4+ T cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories Support-vector networks Quantifiable predictive features define epitope-specific T cell receptor repertoires On a model of associative memory with huge storage capacity BERT: pre-training of deep bidirectional transformers for language understanding Solving the multiple instance problem with axis-parallel rectangles Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire Predicting antigen-specificity of single T-cells based on TCR CDR3 regions. We apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort. keywords: attention; classification; data; datasets; deeprc; et al; hopfield; input; learning; lstm; methods; motif; networks; number; repertoire; search; sequences; table cache: cord-103297-4stnx8dw.txt plain text: cord-103297-4stnx8dw.txt item: #47 of 119 id: cord-193356-hqbstgg7 author: None title: cord-193356-hqbstgg7 date: None words: 14115 flesch: 51 summary: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning Explaining and interpreting LSTMs Solving the protein sequence metric problem Rank-loss support instance machines for miml instance annotation Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires Multiple instance learning: a survey of problem characteristics and applications VDJServer: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements Tetramer-visualized gluten-specific CD4+ T cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories Support-vector networks Quantifiable predictive features define epitope-specific T cell receptor repertoires On a model of associative memory with huge storage capacity BERT: pre-training of deep bidirectional transformers for language understanding Solving the multiple instance problem with axis-parallel rectangles Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire Predicting antigen-specificity of single T-cells based on TCR CDR3 regions. We apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort. keywords: attention; classification; data; datasets; deeprc; et al; hopfield; input; learning; lstm; methods; motif; networks; number; repertoire; search; sequences; table cache: cord-193356-hqbstgg7.txt plain text: cord-193356-hqbstgg7.txt item: #48 of 119 id: cord-193910-7p3f3znj author: Zhang, Xiangxie title: Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification date: 2020-11-01 words: 7746 flesch: 59 summary: In the experiments, the performances of feature extraction using primers and random DNA sequences will be compared to several other machine learning approaches. Since 37 primers of HCV were acquired, we generated three groups of random DNA sequences, and each contains 37 DNA sequences. keywords: data; dna; dna sequences; extraction; feature; learning; method; model; results; sequences; string cache: cord-193910-7p3f3znj.txt plain text: cord-193910-7p3f3znj.txt item: #49 of 119 id: cord-203232-1nnqx1g9 author: Canturk, Semih title: Machine-Learning Driven Drug Repurposing for COVID-19 date: 2020-06-25 words: 5028 flesch: 49 summary: Using the National Center for Biotechnology Information virus protein database and the DrugVirus database, which provides a comprehensive report of broad-spectrum antiviral agents (BSAAs) and viruses they inhibit, we trained ANN models with virus protein sequences as inputs and antiviral agents deemed safe-in-humans as outputs. This undermined our assumption that drug trials are hierarchical; though, in reality this is usually the case. keywords: acid; amino; antivirals; cov-2; database; dataset; drug; models; sars; sequences; virus cache: cord-203232-1nnqx1g9.txt plain text: cord-203232-1nnqx1g9.txt item: #50 of 119 id: cord-213136-euv6pqh5 author: Singh, Kulveer title: Sequence Effects on Internal Structure of Droplets of Associative Polymers date: 2020-05-17 words: 4331 flesch: 51 summary: Similar time evolution is observed in all other systems with different polymer sequences and in all cases the time it takes a single droplet to form is below 20, 000. As we have shown before, this choice of interaction parameters guarantees phase separation via formation of polymer droplets. keywords: droplet; polymer; sequences; solvent; stickers cache: cord-213136-euv6pqh5.txt plain text: cord-213136-euv6pqh5.txt item: #51 of 119 id: cord-252347-vnn4135b author: Lee, Wai-Ming title: A Diverse Group of Previously Unrecognized Human Rhinoviruses Are Common Causes of Respiratory Illnesses in Infants date: 2007-10-03 words: 5718 flesch: 46 summary: Selection of the target region To identify a genomic region suitable for molecular typing of HRV, we analyzed all published HRV sequences. These results suggested HRV serotypes are stable and do not undergo influenza virus-like antigenic drift [7] . keywords: hrv; hrvs; human; new; pcr; region; sequences; serotypes; strains cache: cord-252347-vnn4135b.txt plain text: cord-252347-vnn4135b.txt item: #52 of 119 id: cord-253436-dz84icdc author: Wille, Michelle title: High Prevalence and Putative Lineage Maintenance of Avian Coronaviruses in Scandinavian Waterfowl date: 2016-03-03 words: 2020 flesch: 46 summary: Influenza A virus, avian paramyxovirus and avian coronavirus Multiple Alignment of DNA Sequences with MAFFT Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods Estimating maximum likelihood phylogenies with PhyML SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building FigTree v1.1.1: Tree figure drawing tool Diverse gammacoronaviruses detected in wild birds from Madagascar Detection and molecular characterization of infectious bronchitis-like viruses in wild bird populations Genetically diverse coronaviruses in wild bird populations of northern England Molecular identification and characterization of novel coronaviruses infecting graylag geese (Anser anser), feral pigeons (Columbia livia) and mallards (Anas platyrhynchos) Identification of avian coronavirus in wild aquatic birds of the central and eastern USA Surveillance of avian coronaviruses in wild bird populations of Korea Absence of coronaviruses, paramyxoviruses, and influenza A viruses in seabirds in the southwestern Indian Ocean Animal migration and infectious disease risk Juveniles and migrants as drivers for seasonal epizootics of avian influenza virus Global patterns of influenza a virus in wild birds The evolutionary genetics and emergence of avian influenza A viruses in wild birds Spatial, temporal, and species variation in prevalence of influenza A viruses in wild migratory birds We wish to thank and the duck trappers at Ottenby Bird Observatory and Jonas Waldenström for collecting and providing samples used in this study, Jonas Blomberg for kindly providing sequence, Mallard CoV sequences generated in this study are indicated with a filled circle and Scaup CoV sequences with an asterisk. We found a prevalence of 18.7% CoV, which is higher than the 0-15% reported previously in wild bird studies [11, 14, 15, [21] keywords: coronaviruses; cov; prevalence; sequences; species cache: cord-253436-dz84icdc.txt plain text: cord-253436-dz84icdc.txt item: #53 of 119 id: cord-254942-g51mjj2b author: Touati, Rabeb title: New methodology for repetitive sequences identification in human X and Y chromosomes date: 2020-10-19 words: 7718 flesch: 49 summary: Two-thirds of the human genome consists of repetitive DNA sequences The identification of repetitive DNA sequences is taking greater and greater importance these days. keywords: chromosomes; dna; dna sequences; fig; genome; human; image; patterns; repeat; scalogram; sequences; tandem cache: cord-254942-g51mjj2b.txt plain text: cord-254942-g51mjj2b.txt item: #54 of 119 id: cord-255194-4i9fc0r7 author: Djikeng, Appolinaire title: Viral genome sequencing by random priming methods date: 2008-01-07 words: 3778 flesch: 46 summary: A cutoff e value of 10 -25 was used to identify viral sequences which matched the reference genome. The work presented here demonstrates the utility of the random genome sequencing method for the generation of viral sequence from positive strand ssRNA (Human Rhinovirus, Turkey astrovirus) and negative strand ssRNA viruses (Newcastle disease virus), ssDNA (enterobacteriphage M13) and dsDNA viruses (woodchuck hepatitis virus and lambda phage). keywords: coverage; genome; method; sequence; sequencing; sispa; viral; viruses cache: cord-255194-4i9fc0r7.txt plain text: cord-255194-4i9fc0r7.txt item: #55 of 119 id: cord-255371-o9oxchq6 author: Nguyen, Thanh Thi title: Genomic Mutations and Changes in Protein Secondary Structure and Solvent Accessibility of SARS-CoV-2 (COVID-19 Virus) date: 2020-07-10 words: 5655 flesch: 52 summary: For the mutation detection purpose, we apply a dynamic programming algorithm to protein AA sequences to get global pairwise alignments between a reference sequence and a query sequence. There have been various protein secondary structure prediction programs in the literature and many of those were developed based on artificial intelligence models using protein AA sequences such as JPred4 [29] , Spider2 keywords: accessibility; cov-2; gene; mutations; number; protein; sars; sequences; solvent; structure; virus cache: cord-255371-o9oxchq6.txt plain text: cord-255371-o9oxchq6.txt item: #56 of 119 id: cord-256278-jvfjf7aw author: Feng, Jie title: New method for comparing DNA primary sequences based on a discrimination measure date: 2010-10-21 words: 2868 flesch: 42 summary: Analysis of genomic sequences by chaos game representation Universal sequence map (USM) of arbitrary discrete sequences Computing distribution of scale independent motifs in biological sequences Biological sequences as pictures: a generic two dimensional solution for iterated maps A measure of similarity of sets of sequences not requiring sequence alignment Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarities of natural sequences Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders Exploration of phylogenetic data using a global sequence analysis method Shared information and program plagiarism detection Algorithmic clustering of music based on string compression Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison Genomic signature: characterization and classification of species assessed by chaos game representation of sequences Detection and characterization of horizontal transfers in prokaryotes using genomic signature H curves, a novel method of representation of nucleotides series especially suited for long DNA sequences Characteristic sequences for DNA primary sequence Metrics for comparing regulatory sequences on the basis of pattern counts Chaos game representation of gene structure Chaos game representation for comparison of whole genomes A statistical method for alignment free comparison of regulatory sequences Dinucleotide relative abundance extremes: a genomic signature Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances Directed graphs of DNA sequences and their numerical characterization 2-D graphical representation of protein sequences and its application to coronavirus phylogeny An information based sequence distance and its application to whole mitochondrial genome phylogeny A 2D graphical representation of DNA sequence A relative similarity measure for the similarity analysis of DNA sequences Characteristic distribution of L-tuple for DNA primary sequence An extension of the Burrows-Wheeler transform Distance measures for biological sequences: some recent approaches A new graphical representation and analysis of DNA sequence structure A new sequence distance measure for phylogenetic tree construction Improved tools for biological sequence comparison Spectral distortion measures for biological sequence comparisons and database searching A probabilistic measure for alignment-free sequence comparison Evolutionary implications of microbial genome tetranucleotide frequency biases Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach New 3D graphical representation of DNA sequence based on dual nucleotides On the similarty of DNA primary sequences On the characterization of DNA primary sequences by triplet of nucleic acid bases Novel 2-D graphical representation of DNA sequences and their numerical characterization Analysis of similarity/ dissimilarity of DNA sequences based on novel 2-D graphical representation Quantifying the speciesspecificity in genomic signatures, synonymous codon choice, amino acid usage and G +C content Statistical analysis of L-tuple frequencies in eubacteria and organells Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human Integrated gene and species phylogenies from unaligned whole genome protein sequences Application of tetranucleotide frequencies for the assignment of genomic fragments Alignment-free sequence comparison-a review The spectrum of genomic signatures: from dinucleotides to chaos game representation A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words Statistical measures of DNA dissimilarity under Markov chain models of base composition The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications The Z curve database: a graphic representation of genome sequences Coronavirus phylogeny based on a geometric approach We thank all the anonymous referees for their valuable suggestions and support. In the first group, researchers represent DNA sequence by curves (Hamori and Ruskin, 1983; Nandy, 1994; Randic et al., 2003a; Zhang et al., 2003; Liao, 2005; Li et al., 2006; Qi et al., 2007; Yu et al., 2009) , numerical sequences (He and Wang, 2002) , or matrices (Randic, 2000; Randic et al., 2001) . keywords: discrimination; dna; method; representation; sequences cache: cord-256278-jvfjf7aw.txt plain text: cord-256278-jvfjf7aw.txt item: #57 of 119 id: cord-256608-ajzk86rq author: van Weezep, Erik title: PCR diagnostics: In silico validation by an automated tool using freely available software programs date: 2019-05-13 words: 4953 flesch: 48 summary: To increase the accuracy of the alignment search (see Discussion), large sequences were fragmented in sequences of maximal 3000 nucleotides with an overlap of 50 nucleotides to prevent the loss of hits of primer or probe sequences spanning the split site. Primer and probe sequences were inserted in all possible combinations and orientations potentially initiating amplification ( Fig. 1 ). keywords: pcr; pcrv; primer; probe; sequences; silico; validation; virus cache: cord-256608-ajzk86rq.txt plain text: cord-256608-ajzk86rq.txt item: #58 of 119 id: cord-263987-ff6kor0c author: Holmes, Ian H. title: Solving the master equation for Indels date: 2017-05-12 words: 7132 flesch: 39 summary: Parameterizing sequence alignment with an explicit evolutionary model Multiple genome rearrangement and breakpoint phylogeny Analytical expression of the purine/pyrimidine codon probability after and before random mutations Analytical solutions of the dinucleotide probability after and before random mutations RNA secondary structure prediction using stochastic context-free grammars and evolutionary history Evolution probabilities and phylogenetic distance of dinucleotides Genome evolution by transformation, expansion and contraction (GETEC) An evolutionary model for maximum likelihood alignment of DNA sequences An introduction to probability theory and its applications Evolutionary HMMs: a Bayesian approach to multiple alignment Using guide trees to construct multiple-sequence evolutionary HMMs Accurate reconstruction of insertion-deletion histories by statistical phylogenetics A note on probabilistic models over strings: the linear algebra approach Statistical alignment based on fragment insertion and deletion models Evolutionary inference via the poisson indel process Inching toward reality: an improved likelihood model of sequence evolution Models of sequence evolution for DNA sequences containing gaps Evolutionary models for insertions and deletions in a probabilistic modeling framework Probabilistic phylogenetic inference with insertions and deletions A probabilistic model for the evolution of RNA structure Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures A probabilistic model for sequence alignment with context-sensitive indels Sequence alignments and pair hidden Markov models using evolutionary history Joint Bayesian estimation of alignment and phylogeny BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny Incorporating indel information into phylogeny estimation for rapidly emerging pathogens Phylogenetic automata, pruning, and multiple alignment Hand Align: Bayesian multiple sequence alignment, phylogeny, and ancestral reconstruction A long indel model for evolutionary sequence alignment An improved model for statistical alignment Chain Monte Carlo Expectation Maximization algorithm for statistical analysis of DNA sequence evolution with neighbor-dependent substitution rates Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context Patterns of insertion and deletion in mammalian genomes Exhaustive matching of the entire protein sequence database Pattern and rate of indel evolution inferred from whole chloroplast intergenic regions in sugarcane, maize and rice Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes For indel models, this latent information must be extended to include hidden site boundaries [56] . keywords: alignment; distributions; evolution; finite; gap; indel; length; matrix; models; probability; sequence; state; time cache: cord-263987-ff6kor0c.txt plain text: cord-263987-ff6kor0c.txt item: #59 of 119 id: cord-264135-s2u76pvk author: Patel, Amrutlal K. title: Complete genome sequence analysis of chicken astrovirus isolate from India date: 2016-12-23 words: 3756 flesch: 36 summary: key: cord-264135-s2u76pvk authors: Patel, Amrutlal K.; Pandit, Ramesh J.; Thakkar, Jalpa R.; Hinsu, Ankit T.; Pandey, Vinod C.; Pal, Joy K.; Prajapati, Kantilal S.; Jakhesara, Subhash J.; Joshi, Chaitanya G. title: Complete genome sequence analysis of chicken astrovirus isolate from India date: 2016-12-23 journal: Vet Res Commun DOI: 10.1007/s11259-016-9673-6 sha: doc_id: 264135 cord_uid: s2u76pvk OBJECTIVE: The consensus length of 7513 bp genome sequence of Indian isolate of chicken astrovirus was obtained after assembly of 14,121 high quality reads. keywords: analysis; astrovirus; capsid; castv; chicken; genome; isolate; protein; sequence cache: cord-264135-s2u76pvk.txt plain text: cord-264135-s2u76pvk.txt item: #60 of 119 id: cord-264296-0x90yubt author: Sawmya, Shashata title: Analyzing hCov genome sequences: Applying Machine Intelligence and beyond date: 2020-06-03 words: 5017 flesch: 56 summary: Thus, every resulting time-step represents a date (Tk for Cluster k) and contains the clusters of genome sequences of the countries/states. Notably, we do not consider any alignmentbased method since it is not computationally feasible for us to align thousands of viral sequences for analysis and clustering purposes [4] . keywords: analysis; coronavirus; countries; features; genome; learning; pipeline; sars; sequences; strain; tree cache: cord-264296-0x90yubt.txt plain text: cord-264296-0x90yubt.txt item: #61 of 119 id: cord-264746-gfn312aa author: Muse, Spencer title: GENOMICS AND BIOINFORMATICS date: 2012-03-29 words: 10983 flesch: 54 summary: In addition to providing storage and retrieval of gene sequences, several of these databases also offer advanced sequence analysis methods and powerful visualization tools. However, if two or more such distantly related organisms have gene sequences that are nearly identical, a strong argument can be made that the gene is critical in both organisms and that the same function has been maintained throughout evolutionary history. keywords: alignment; data; database; dna; expression; figure; gene; genome; genomic; human; levels; nucleotides; number; protein; rna; sequence cache: cord-264746-gfn312aa.txt plain text: cord-264746-gfn312aa.txt item: #62 of 119 id: cord-265857-fs6dj3dp author: Liu, Yu-Tsueng title: Infectious Disease Genomics date: 2010-12-24 words: 4346 flesch: 33 summary: S-OIV emerged in the spring of 2009 in Mexico and was also discovered in specimens from two unrelated children in the San Diego area in April 2009 (CDC, 2009; Dawood et al., 2009) . S-OIV has three genome segments (HA, NP, NS) from the classic North American swine (H1N1) lineage, two segments (PB2, PA) from the North American avian lineage, one segment (PB1) from the seasonal H3N2, and most notably, two segments (NA, M) from the Eurasian swine (H1N1) lineage (Dawood et al., 2009) . keywords: disease; et al; genome; human; malaria; mosquito; sequence; sequencing; vaccine; vector; virus cache: cord-265857-fs6dj3dp.txt plain text: cord-265857-fs6dj3dp.txt item: #63 of 119 id: cord-266288-buc4dd5y author: Dong, Rui title: A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance date: 2019-04-09 words: 5255 flesch: 55 summary: Fast algorithms for computing sequence distances by exhaustive substring composition A novel method of characterizing genetic sequences: genome space with biological distance and applications A new method to cluster genomes based on cumulative Fourier power spectrum Ecology, evolution and classification of bat coronaviruses in the aftermath of SARS A phylogenetic analysis of the Brassicales clade on an alignmet-free sequence comparison method From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison A new method to cluster DNA sequences using Fourier power spectrum Complete genome sequene of middle east respiratory syndrome Coronavirus KOR/KNIH/002_05_2016, isolated in South Korea Evolutionary and inheritance of animal mitochondrial DNA: rules and exceptions Virus classification in 60-dimensional protein space Complete genome sequence of middle east respiratory syndrome Coronavirus (MERS-CoV) from the first imported MERS-CoV case in China Mitochondrial data are not suitable for resolving placental mammals phylogeney Molecular phylogenetics and the origins of placental mammals Large-scale sequence analysis of avian influenza isolates Comparison of phylogenetic trees Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions Numerical Taxonomy The interrelationships of placental mammals and the limits of phylogenetic inference Characterization and complete genome sequence of a novel Coronavirus, Coronavirus HKU1, from patients with pneumonia Therefore, it can distinguish different sequences and classify species into correct clusters with higher accuracy and less time cost. keywords: dataset; method; sequence; vector; viruses cache: cord-266288-buc4dd5y.txt plain text: cord-266288-buc4dd5y.txt item: #64 of 119 id: cord-266794-oyppubq5 author: Zhang, Dachuan title: SARS2020: An integrated platform for identification of novel coronavirus by a consensus sequence-function model date: 2020-09-01 words: 1019 flesch: 41 summary: For sequence function annotation, the family classification method captures common properties from the samples and extracts their feature vectors using machine learning algorithms, then merges the sequences into clusters or families. These predicted functions will provide valuable reference for further study of biological activity and pathogenesis of the 2019-nCoV. We built an integrated platform to assist 2019-nCoV research, and we proposed a novel consensus sequence-function model for using genome sequence data to identify unknown species. keywords: 2019; function; ncov; sequence cache: cord-266794-oyppubq5.txt plain text: cord-266794-oyppubq5.txt item: #65 of 119 id: cord-266960-kyx6xhvj author: Temple, Mark D. title: Real-time audio and visual display of the Coronavirus genome date: 2020-10-02 words: 6781 flesch: 52 summary: This paper demonstrates that sonification of RNA sequence data may also be useful to understand how the genome functions. During this time a large body of evidence has arisen regarding RNA sequence homology to other SARS like virus strains keywords: audio; data; display; genome; reading; region; rna; sequence; sonification; transcription; translation cache: cord-266960-kyx6xhvj.txt plain text: cord-266960-kyx6xhvj.txt item: #66 of 119 id: cord-267500-x3u9i1vq author: Rose, Rebecca title: Challenges in the analysis of viral metagenomes date: 2016-08-03 words: 5929 flesch: 26 summary: Automatic pipelines which combine various homology search strategies to identify a final set of viral reads include VirusHunter (Zhao et al. 2013) , a Perl script that automates viral identification using BLAST prior to assembly; MetaVir (Roux et al. 2011) , a web application that compares users' datasets to published viral sequences; and VirSorter (Roux et al. 2015) , which identifies prophages and viruses by comparison with custom datasets. Various software tools have been developed to accommodate the unique challenges and use cases associated with characterizing viral sequences; however, the quality of these tools varies, and their use often necessitates computing expertise or access to powerful computers, thus limiting their usefulness to many researchers. keywords: analysis; approaches; assembly; data; et al; genomes; graph; novo; reads; sequences; sequencing; tools cache: cord-267500-x3u9i1vq.txt plain text: cord-267500-x3u9i1vq.txt item: #67 of 119 id: cord-268467-btfz6ye8 author: Schreiber, Steven S. title: Sequence analysis of the nucleocapsid protein gene of human coronavirus 229E date: 1989-03-31 words: 5049 flesch: 51 summary: RNAGenetits Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic virus Mouse hepatitis virus A59: The mRNAs of coronaviruses contain a stretch of leader sequence which is derived from the 5'-end of the viral genome and exhibits homologywith the intergenic consensus sequence Budzilowicz et al., 1985) . keywords: a/.; coronaviruses; hcv-229e; human; leader; mrna; nucleocapsid; protein; rna; sequence; virus cache: cord-268467-btfz6ye8.txt plain text: cord-268467-btfz6ye8.txt item: #68 of 119 id: cord-268549-2lg8i9r1 author: Dai, Qi title: Sequence comparison via polar coordinates representation and curve tree date: 2012-01-07 words: 4368 flesch: 49 summary: At the same time, if the value of o is too large, the curvature difference on the small-scale will be covered that is not good for sequence representation either. The curve tree was then constructed to numerically characterize the closed curve of biological sequences, and further compared biological sequences by evaluating the distance of the curve tree of the query sequence matching against a corresponding curve tree of the template sequence. keywords: curve; dna; et al; randic; representation; sequences; tree cache: cord-268549-2lg8i9r1.txt plain text: cord-268549-2lg8i9r1.txt item: #69 of 119 id: cord-274056-9t3kneoo author: Abd Elwahaab, Marwa A. title: A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector date: 2019-05-08 words: 3315 flesch: 55 summary: 2D and 3D amino acid adjacency matrices A new method to analyze protein sequence similarity using dynamic time warping A 2D graphical representation of protein sequence and its numerical characterization Graphical representation and similarity analysis of protein sequences based on fractal interpolation ADLD: a novel graphical representation of protein sequences and its application Comparative analysis of protein primary sequences with graph energy UC-curve: a highly compact 2D graphical representation of protein sequences The graphical representation of protein sequences based on the physicochemical properties and its applications F-curve, a graphical representation of protein sequences for similarity analysis based on physicochemical properties of amino acids A novel method of 2D graphical representation for proteins and its application 3D graphical representation of protein sequences and their statistical characterization Novel numerical characterization of protein sequences based on individual amino acid and its application Similarities/dissimilarities analysis of protein sequences based on PCA-FFT On novel representation of proteins based on amino acid adjacency matrix A sequence-segmented method applied to the similarity analysis of long protein sequence It is a figure which summarizes our approach. In our work, a representative of each of three groups of protein sequences is introduced. keywords: dissimilarity; group; protein; sequences; similarity; vector cache: cord-274056-9t3kneoo.txt plain text: cord-274056-9t3kneoo.txt item: #70 of 119 id: cord-275258-azpg5yrh author: Mead, Dylan J.T. title: Visualization of protein sequence space with force-directed graphs, and their application to the choice of target-template pairs for homology modelling date: 2019-07-26 words: 6335 flesch: 48 summary: As the taxonomical distance increases, production of high quality homology models becomes more difficult. Human-infective virus Importance to human health NCBI RefSeq annotated genome Easy retrieval of high quality RdRP sequence RdRP located at the 3 0 end of polyprotein or on its own segment Eliminates unconventional RdRPs keywords: genus; homology; modelling; models; quality; rdrp; sequence; structure; table; target; template cache: cord-275258-azpg5yrh.txt plain text: cord-275258-azpg5yrh.txt item: #71 of 119 id: cord-279528-41atidai author: Abo-Elkhier, Mervat M. title: Measuring Similarity among Protein Sequences Using a New Descriptor date: 2019-11-22 words: 3048 flesch: 52 summary: The graphical representation of protein sequence is a simple way to visualize protein sequences. Basic local alignment search tool Gapped BLAST and PSI-BLAST: a new generation of protein database search programs CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Graphical representation of proteins Similarity/dissimilarity calculation methods of DNA sequences: a survey Highly compact 2D graphical representation of DNA sequences, SAR and QSAR Unique graphical representation of protein sequences based on nucleotide triplet codons Novel 2-D graphical representation of proteins Representation of protein sequences on latitude-like circles and longitude-like semi-circles On a geometry-based approach to protein sequence alignment DNA sequence comparison by a novel probabilistic method 2-D Graphical representation of proteins based on physico-chemical properties of amino acids A 2D graphical representation of protein sequence and its numerical characterization 3-D maps and coupling numbers for protein sequences 3D graphical representation of protein sequences and their statistical characterization DNA sequence representation without degeneracy Protein map: an alignment-free sequence comparison method based on various properties of amino acids On novel representation of proteins based on amino acid adjacency matrix Protein alignment: exact versus approximate. keywords: amino; protein; representation; sequences; similarity; table cache: cord-279528-41atidai.txt plain text: cord-279528-41atidai.txt item: #72 of 119 id: cord-280881-5o38ihe0 author: Wlodawer, Alexander title: A model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases date: 2003-11-11 words: 4872 flesch: 47 summary: A homology-derived model of human CLN2 Figure 5 A homology-derived model of human CLN2. One of the symptoms of the disease is the accumulation of an autofluorescent material, ceroid-lipofuscin, in lysosomal storage bodies in various cell types, primarily in the nerv-A model of the active site of human CLN2 Figure 6 A model of the active site of human CLN2. keywords: cln2; conserved; enzymes; figure; human; kumamolisin; model; residues; sedolisin; sequence cache: cord-280881-5o38ihe0.txt plain text: cord-280881-5o38ihe0.txt item: #73 of 119 id: cord-287634-64zqe4cz author: Al-Ssulami, Abdulrakeeb M. title: CodSeqGen: A tool for generating synonymous coding sequences with desired GC-contents date: 2020-01-31 words: 2307 flesch: 54 summary: In this paper, we present an algorithmic solution to produce coding sequences that follow exactly a primary amino acid sequence and a desired GC-content. Although, these tools generate random DNA and coding sequences, none of them are capable of producing coding sequences given the amino acid sequence and GC-content. keywords: amino; content; sequences cache: cord-287634-64zqe4cz.txt plain text: cord-287634-64zqe4cz.txt item: #74 of 119 id: cord-287658-c2lljdi7 author: Lopez-Rincon, Alejandro title: Classification and Specific Primer Design for Accurate Detection of SARS-CoV-2 Using Deep Learning date: 2020-09-10 words: 4786 flesch: 46 summary: These methods rely on the assumption that cDNA sequences share common features, and their order prevails among different sequences 19, 20 . We then validate the discovered sequences on datasets not used during the training of the CNN, and show how to exploit them to create a novel, highly informative set of sequence features (e.g. viral sequences). keywords: bps; coronavirus; cov-2; data; learning; primer; samples; sars; sequences; set; virus cache: cord-287658-c2lljdi7.txt plain text: cord-287658-c2lljdi7.txt item: #75 of 119 id: cord-291156-zxg3dsm3 author: Bernasconi, Anna title: Empowering Virus Sequences Research through Conceptual Modeling date: 2020-05-01 words: 4605 flesch: 34 summary: The manuscript is organized as follows: Section 2 overviews current technologies available for virus sequence data management. Many other resources link to viral sequence data, including: drug databases, particularly interesting as they provide information about clinical studies (see ClinicalTrials 10 ), protein sequences databases (e.g., UniProtKB/Swiss-Prot [32] ), and cell lines databases (e.g., Cellosaurus [3] ). keywords: cov2; covid-19; data; database; entity; genomic; information; model; sars; sequence; vcm; virus cache: cord-291156-zxg3dsm3.txt plain text: cord-291156-zxg3dsm3.txt item: #76 of 119 id: cord-296691-cg463fbn author: Wang, Ren title: De novo Sequence Assembly and Characterization of Lycoris aurea Transcriptome Using GS FLX Titanium Platform of 454 Pyrosequencing date: 2013-04-09 words: 5838 flesch: 41 summary: Hence, determination of the genetic pathways and specific genes involved in Amaryllidaceae alkaloids biosynthesis and some other aspects of Lycoris could be beneficial for humans and enrich our knowledge and understanding of functional genomics and biological research. For the purpose of improving mRNA abundance of genes related to Amaryllidaceae alkaloids biosynthesis, the leaves were treated with those abiotic elicitors for RNA extraction. keywords: alkaloids; amaryllidaceae; analysis; aurea; biosynthesis; cdna; galanthamine; genes; lycoris; molecular; sequences; sequencing; species; total; transcriptome cache: cord-296691-cg463fbn.txt plain text: cord-296691-cg463fbn.txt item: #77 of 119 id: cord-300149-djclli8n author: Ruan, Yijun title: Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection date: 2003-05-24 words: 4358 flesch: 47 summary: We compared sequence data generated from the library with human, mouse, and viral genome databases managed at the US National Center for Biotechnology The basic local alignment search tool is a system for searching similar sequences against all available sequence databases irrespective of whether the query is DNA or protein sequences. Associations between the members of the coronaviridae family to the SARS virus were assessed by comparing overlapping fragments of the SIN2500 genomic sequence against a database of coronavirus sequences. keywords: analysis; coronavirus; cov; genome; hotel; isolates; protein; rna; sars; sequence; singapore; spike cache: cord-300149-djclli8n.txt plain text: cord-300149-djclli8n.txt item: #78 of 119 id: cord-300796-rmjv56ia author: None title: The signal sequence of the p62 protein of Semliki Forest virus is involved in initiation but not in completing chain translocation date: 1990-09-01 words: 8108 flesch: 49 summary: However, the typical cytoplasmic orientation of the NH2-termini of membrane protein chains carrying a combined signal sequence-anchoring peptide suggests that signal sequences in general might direct their function in translocation through the insertion of their hydrophobic and uncharged stretch of amino acid residues into the membrane in such an orientation that the NHEterminus of the signal remains on the outside of the ER mem- The possibility that our results about p62 protein translocation would be unique to the viral system .and different from the general translocation process in the ER we find most unlikely. keywords: chain; dhfr; et al; fig; glycosylation; membrane; p62; p62 protein; p62 signal; protein; region; sequence; signal; signal sequence; time; translocation cache: cord-300796-rmjv56ia.txt plain text: cord-300796-rmjv56ia.txt item: #79 of 119 id: cord-300807-9u8idlon author: Tong, Joo Chuan title: 7 Infectious disease informatics date: 2013-12-31 words: 2437 flesch: 47 summary: In cases where the ancestry is unclear, sequence alignment methods can be used to infer their phylogenetic relationships. Upcoming challenges for multiple sequence alignment methods in the high-throughput era Founder effects in the assessment of HIV polymorphisms and HLA allele associations Prediction and entropy of printed English HLA class I restriction as a possible driving force for Chikungunya evolution Complete-proteome mapping of human infl uenza keywords: acid; amino; diseases; selection; sequences; sites; substitution cache: cord-300807-9u8idlon.txt plain text: cord-300807-9u8idlon.txt item: #80 of 119 id: cord-301827-a7hnuxy5 author: Uversky, Vladimir N title: A decade and a half of protein intrinsic disorder: Biology still waits for physics date: 2013-04-29 words: 20990 flesch: 37 summary: Why these proteins are intrinsically disordered Caseins as rheomorphic proteins: interpretation of primary and secondary structures of the as1-, b-, and k-caseins The relation of polypeptide hormone structure and flexibility to receptor binding: the relevance of X-ray studies on insulins, glucagon and human placental lactogen High-resolution proton-magnetic-resonance studies of chromatin core particles Protein structure and enzyme activity Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for beta-structure NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded Protein structure protection commits gene expression patterns A protein-chameleon: conformational plasticity of alpha-synuclein, a disordered protein involved in neurodegenerative disorders Malleable machines take shape in eukaryotic transcriptional regulation Operational definition of intrinsically unstructured protein sequences based on susceptibility to the 20S proteasome Drugs for 'protein clouds': targeting intrinsically disordered transcription factors Protein dynamics: dancing on an ever-changing free energy stage Protein flexibility, not disorder, is intrinsic to molecular recognition TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder Intrinsic disorder and functional proteomics Sequence complexity of disordered protein Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization The protein non-folding problem: amino acid determinants of intrinsic order and disorder Composition Profiler: a tool for discovery and visualization of amino acid composition differences Comparing predictors of disordered protein A practical overview of protein disorder prediction methods Predicting protein disorder and induced folding: from theoretical principles to practical applications Prediction of protein disorder at the domain level Prediction of protein disorder Predicting intrinsic disorder in proteins: an overview Inherent relationships among different biophysical prediction methods for intrinsically disordered proteins Intrinsic protein disorder in complete genomes Prediction and functional analysis of native disorder in proteins from the three kingdoms of life The mysterious unfoldome: structureless, underappreciated, yet vital part of any given proteome Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life Thousands of proteins likely to have long disordered regions Norton RS (2006) This study showed that the fraction of protein disorder was positively correlated with both measured RNA expression levels of E. coli genes in three different growth media and with predicted abundance levels of E. coli proteins. keywords: acid; amino; analysis; binding; cell; complex; diseases; disorder; disordered proteins; disordered regions; domains; evolution; fact; folding; function; idps; interactions; membrane; molten; p53; partners; protein structure; proteins; regions; regulation; residues; sequence; signaling; state; structure cache: cord-301827-a7hnuxy5.txt plain text: cord-301827-a7hnuxy5.txt item: #81 of 119 id: cord-302161-ytr7ds8i author: Lutz, Mirjam title: FCoV Viral Sequences of Systemically Infected Healthy Cats Lack Gene Mutations Previously Linked to the Development of FIP date: 2020-07-24 words: 9917 flesch: 50 summary: To track viral sequence mutations in organs of healthy FCoV carrier cats, we investigated FCoV sequences detected in the colon, liver, and thymus, as well as feces of seven experimentally FCoV infected cats. The overall comparison of FCoV gene sequences from the different cats euthanized at different time points after infection did not reveal any significant differences (Table 3) . keywords: cats; challenge; fcov; fecal; feline; fip; gene; infection; mutations; samples; sequences; study; tissue; virus cache: cord-302161-ytr7ds8i.txt plain text: cord-302161-ytr7ds8i.txt item: #82 of 119 id: cord-302798-q0mbngqy author: Ge, Junwei title: Genomic characterization of circoviruses associated with acute gastroenteritis in minks in northeastern China date: 2018-06-14 words: 4347 flesch: 51 summary: The examination of other MiCV sequences from different regions will help to assess the level of genetic diversity. Other sequences were obtained from GenBank; accession numbers of those sequences are included in the tree to our knowledge of the pathogenic potential of MiCV and its association with mink enteritis if our results were corroborated by further reports. keywords: amino; analysis; batcv; circovirus; cvs; genome; micv; mink; nucleotide; sequence; tac; tat cache: cord-302798-q0mbngqy.txt plain text: cord-302798-q0mbngqy.txt item: #83 of 119 id: cord-304607-td0776wj author: Paszkiewicz, Konrad H. title: Omics, Bioinformatics, and Infectious Disease Research date: 2010-12-24 words: 7023 flesch: 39 summary: In addition, 21 nonannotated regions had clear levels of transcription and should therefore be considered as genes (Passalacqua et al., 2009) . Indeed, the first bacterial genomes sequenced were those from pathogens Fraser et al., 1995; Tomb et al., 1997) , and these were preceded by many bacteriophage genomes such as bacteriophage MS2 (Fiers et al., 1976) and ϕX174 (Sanger et al., 1977) and viral genomes (Fiers et al., 1978) . keywords: analysis; assembly; bioinformatics; data; disease; et al; genes; genome; genomics; proteins; sequence; sequencing; species; vaccine cache: cord-304607-td0776wj.txt plain text: cord-304607-td0776wj.txt item: #84 of 119 id: cord-304869-l6a68tqn author: Bielińska-Wąż, Dorota title: Graphical and numerical representations of DNA sequences: statistical aspects of similarity date: 2011-08-28 words: 15415 flesch: 58 summary: Though q may be easily increased up to higher-orders, as we shall see, the information about similarity sequences is specific enough up to the fourth order. Two bases belonging to different sequences, both located on the p-th positions are represented by a pair of numbers, {x p , n p }. keywords: alignment; bases; descriptors; dna sequences; example; fig; graphs; methods; representation; sequences; similarity; table cache: cord-304869-l6a68tqn.txt plain text: cord-304869-l6a68tqn.txt item: #85 of 119 id: cord-306725-0vam15pt author: Li, Hao title: First detection and genomic characteristics of bovine torovirus in dairy calves in China date: 2020-05-09 words: 3021 flesch: 55 summary: Nucleotide and deduced amino acid sequences were compared using the MegAlign program of Lasergene software, version 7.1 (DNASTAR, Madison, WI, USA). In this research, we determined the obtained two complete genome sequences of two BToV isolates from the same farm in Sichuan province, increasing the number of BToV genome sequences in the GenBank database to five, thus contributing to a better understanding of the genome structure and genetic evolution of BToV. Phylogenetic analysis indicated that these two BToV isolates had a close genetic relationship to strains from Japan. keywords: acid; amino; bovine; btov; complete; sequences; strains; torovirus cache: cord-306725-0vam15pt.txt plain text: cord-306725-0vam15pt.txt item: #86 of 119 id: cord-310734-6v7oru2l author: Bolatti, Elisa M. title: A Preliminary Study of the Virome of the South American Free-Tailed Bats (Tadarida brasiliensis) and Identification of Two Novel Mammalian Viruses date: 2020-04-09 words: 8482 flesch: 37 summary: Ubiquitous Viruses With Small Genomes and a Diverse Host Range Determination of the origin cleavage and joining domain of geminivirus Rep proteins Identification of the nicking tyrosine of geminivirus Rep protein A single rep protein initiates replication of multiple genome components of faba bean necrotic yellows virus, a single-stranded DNA virus of plants Geminivirus replication proteins are related to prokaryotic plasmid rolling circle DNA replication initiator proteins Conserved sequence and structural motifs contribute to the DNA binding and cleavage activities of a geminivirus replication protein Functional analysis of a novel motif conserved across geminivirus Rep proteins A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication The oligomeric Rep protein of Mungbean yellow mosaic India virus (MYMIV) is a likely replicative helicase DNA Helicase Activity Is Associated with the Replication Initiator Protein Rep of Tomato Yellow Leaf Curl Geminivirus Contaminating viral sequences in high-throughput sequencing viromics: A linkage study of 700 sequencing libraries Clinical Metagenomic Next-Generation Sequencing for Pathogen Detection Development and Optimization of Metagenomic Next-Generation Sequencing Methods for Cerebrospinal Fluid Diagnostics Quality control implementation for universal characterization of DNA and RNA viruses in clinical respiratory samples using single metagenomic next-generation sequencing workflow Metagenomic Analysis of Viruses from Bat Fecal Samples Reveals Many Novel Viruses in Insectivorous Bats in China Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery Limited reverse transcriptase activity of phi29 DNA polymerase Deciphering the bat virome catalog to better understand the ecological diversity of bat viruses and the bat origin of emerging infectious diseases High diversity of rabies viruses associated with insectivorous bats in Argentina: The analysis also identified (although in low counts) viral sequences related to the family Alloherpesviridae, which infects fish and amphibians. keywords: analysis; bat; bats; brasiliensis; contigs; dna; families; gene; genome; metagenomic; novel; pairs; protein; read; rep; samples; sequence; species; tbrapv1; viruses cache: cord-310734-6v7oru2l.txt plain text: cord-310734-6v7oru2l.txt item: #87 of 119 id: cord-311240-o0zyt2vb author: Motayo, Babatunde Olarenwaju title: Evolution and Genetic Diversity of SARSCoV-2 in Africa Using Whole Genome Sequences date: 2020-07-27 words: 3104 flesch: 42 summary: There has been paucity of data on the genetic evolution of SARSCoV-2 sequences from Africa, despite the increasing number of genome sequence submissions into the GISAID database from Africa; there were 97 whole genome sequences available in the GISAID database as at 24 th April 2020. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes. keywords: africa; analysis; et al; genome; sarscov-2; sequences; virus cache: cord-311240-o0zyt2vb.txt plain text: cord-311240-o0zyt2vb.txt item: #88 of 119 id: cord-311839-61djk4bs author: Wei, Dan title: A novel hierarchical clustering algorithm for gene sequences date: 2012-07-23 words: 8046 flesch: 58 summary: Major algorithms used in gene sequence clustering can be divided into two categories according to the result format: hierarchical clustering algorithms and partitional clustering algorithms We have applied mBKM with DMk in clustering gene sequences and performing phylogenetic analysis. keywords: alignment; clustering; data; distance; dmk; mbkm; measure; method; number; sequences; tuple cache: cord-311839-61djk4bs.txt plain text: cord-311839-61djk4bs.txt item: #89 of 119 id: cord-321150-ev6acl7b author: Lam, Ha Minh title: Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm date: 2017-10-03 words: 3189 flesch: 44 summary: To illustrate improved runtimes and memory usage of the new 3SEQ algorithm, we searched for recombinants among large sequence data sets of dengue virus serotype 2, Ebola virus, the coronavirus responsible for Middle-East Respiratory Syndrome (MERS) and Zika virus; see table 1. Ebola virus sequences were restricted to human viruses sampled in Africa after December 1, 2013. keywords: recombination; sequence; sites; virus cache: cord-321150-ev6acl7b.txt plain text: cord-321150-ev6acl7b.txt item: #90 of 119 id: cord-321386-u1imic5l author: Li, Chun title: Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation date: 2018-02-17 words: 5524 flesch: 56 summary: The results illustrated the better performance of our method. Identification of DNA-binding proteins using support vector machines and evolutionary profiles DNA-prot: identification of DNA binding proteins from protein sequence information using random forest iDNA-prot: identification of DNA binding proteins using random forest with grey model enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning gDNA-Prot: predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of Protein sequence Numerical characterization of protein sequences based on the generalized Chou's pseudo amino acid composition Light-directed synthesis of peptide nucleic acids (PNAs) chips Protein structure prediction from sequence variation Principles that govern the folding of protein chains Prediction of protein cellular attributes using pseudoamino acid composition Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions Identify recombination spots with pseudo dinucleotide composition Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM Protein sequence comparison based on physicochemical properties and the position-feature energy matrix A Novel protein characterization based on pseudo amino acids composition and star-like graph topological indices Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses A computational approach to simplifying the protein folding problem Modeling study on the validity of a possibly simplified representation of proteins 2-D graphical representation of protein sequences and its application to coronavirus phylogeny Clustering of the protein design alphabets by using hierarchical self-organizing map A novel descriptor of protein sequences and its application BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences Amino acid difference formula to help explain protein Correlation analysis of some physical chemistry properties among genetic codons and amino acids Similarity analysis of protein sequences based on the normalized relative entropy On 3-D graphical representation of DNA primary sequences and their numerical characterization Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation Milestones in graphical bioinformatics Graphical representation of proteins Representation of proteins as walks in 20-D space Phylogenetic analysis of DNA sequences based on k-word and rough set theory On the characterization of DNA primary sequences by triplet of nucleic acid bases DV-Curve: A novel intuitive tool for visualizing and analyzing DNA sequences A Novel method for similarity analysis and protein sub-cellular localization prediction The Zagreb indices 30 years after On vertex-degree-based molecular structure descriptors Graphs with fixed number of pendent vertices and minimal Zeroth-order general Randic index New invariant of DNA sequences Genetic drift of human coronavirus OC43 spike gene during adaptive evolution WHO MERS-CoV global summary and risk assessment Assessing the accuracy of prediction algorithms for classification: an overview iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition Using deformation energy to analyze nucleosome positioning in genomes iRNA-PseU: identifying RNA pseudouridine sites Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve Using a Euclid distance discriminant method to find protein coding genes in the yeast genome The authors' greatest gratitude goes to the anonymous referees for their insightful suggestions and generous support. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. keywords: acids; amino; dataset; dna; group; matrix; method; model; protein; sequence; vector cache: cord-321386-u1imic5l.txt plain text: cord-321386-u1imic5l.txt item: #91 of 119 id: cord-321715-bkfkmtld author: Redelings, Benjamin D title: Incorporating indel information into phylogeny estimation for rapidly emerging pathogens date: 2007-03-14 words: 9797 flesch: 50 summary: The order of sequence alignment can bias the selection of tree topology An evolutionary model for maximum likelihood alignment of DNA sequences Inching towards reality: an improved likelihood model of sequence evolution Joint Bayesian Estimation of Alignment and Phylogeny A codon-based model of nucleotide substitution for protein-coding DNA sequences Mathematical and Statistical Methods for Genetic Analysis Subtree Transfer Operations and their Induced Metrics on Evolutionary Trees Monte Carlo Strategies in Scientific Computing A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution Dating of the human-ape splitting by a molecular clock of mitochondrial DNA Wain-Hobson S: Antigenic Stimulation by BGC vaccine as an in vivo driving force for SIV replication and dissemination Evolution of a Noncoding Region of the Chloroplast Genome Gaps as characters in sequencebased phylogenetic analyses Incorporating information from length-mutational events into phylogenetic analysis The evolution of the non-coding Chloroplast DNA and its application in Plant Systematics Indel patterns of the plastid DNA trnL-trnF region within the genus Poa (Poaceae) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice MUSCLE: multiple sequence alignment with high accuracy and high throughput We would like to thank Vladimir Minin for many helpful discussions. A major advantage of this symmetry is that it is clear how to construct alignment models on an unrooted tree and leads to greater simplicity in model implementation and, arguably, decreased computation time. keywords: alignment; branch; codon; data; distribution; indel; information; joint; length; model; number; sequence; set; tree cache: cord-321715-bkfkmtld.txt plain text: cord-321715-bkfkmtld.txt item: #92 of 119 id: cord-321762-7kiahjyy author: Nandy, Ashesh title: Chapter 5 The GRANCH Techniques for Analysis of DNA, RNA and Protein Sequences date: 2015-12-31 words: 9799 flesch: 43 summary: Developments in the graphical representation and numerical characterization of DNA sequences raised the possibilities of using similar analysis of protein sequences, albeit with difficulty arising from the fact that now we have to contend with 20 amino acids making up a protein chain whereas DNA sequences were made up of only four nucleotides. Paper presented at the Indo-US Workshop on Mathematical Chemistry Indexing scheme and similarity measures for macromolecular sequences On 3-D representation of DNA primary sequences Novel analysis of DNA and Protein sequences through Graphical Representation and Numerical Characterization techniques Novel Techniques of Graphical Representation and Analysis of DNA Sequences -A Review Visualization and analysis of DNA sequences using DNA walks Mathematical descriptors of DNA sequences: development and applications New Approaches to Drug-DNA Interactions Based on Graphical Representation and Numerical Characterization of DNA Sequences Graphical representation and mathematical characterization of protein sequences and applications to viral proteins DNA Sequence Visualization Charcaterizations of DNA Primary Sequences Molecular Descriptors for Chemoinformatics, Methods and Principles in Medicinal Chemistry Genome analysis: A new approach for visualisation of sequence organisation in genomes Mathematicalc haracterisationo f chaos, game representation: New algorithms for nucleotide sequence analysis Chaos game representation of similarities and differences between genomic sequences H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences Random walk and gap plots of DNA sequences Graphical analysis of DNA sequence structure: III. keywords: analysis; bases; descriptors; dna; dna sequences; et al; gene; graphical; method; numerical; protein; protein sequences; representation; sequences cache: cord-321762-7kiahjyy.txt plain text: cord-321762-7kiahjyy.txt item: #93 of 119 id: cord-324021-y1vr1db0 author: Kozak, M. title: Determinants of translational fidelity and efficiency in vertebrate mRNAs date: 1994-12-31 words: 5083 flesch: 38 summary: The scanning model for translation: an update A consideration of alternative models for the initiation of translation in eukaryotes Thyroid hormone receptor transcriptional activity is potentially autoregulated by truncated forms of the receptor Tracheal U (1992) N-terminal truncation of salmon calcitonin leads to calcitonin antagonists Mutation eliminating mitochondrial leader sequence of methylmalonyI-CoA mutase causes tlWI ° methyl-malonic acidemia Translation of insulin-r~lated polypeptides from messenger RNAs with tandemly reiterated copies of the ribosome binding site Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukm'yotic ribosomes At least six nuclcotides preceding the AUG initiator codon enhance translation in mammalian cells An analysis of 5' non-coding sequences from 699 vertebrate messenger RNAs Context effects and (inefficient) initiation at non-AUG codons in eukaryotic cell-free translation systems Expression of bacterial chitinase protein in tobacco leaves using two photosynthetic 8ene promoters Cavener DR (1991) Translation initiation in DnJ,wq~hila mehmogaster is reduced by mutations upstream of the AUG initiator codon Mutational analysis of the HIS4 translational initiator region in Sarrhammyres (,erevisiae Influence of the three nucleotides upstream of the initiation codon on expression of the E colt lacZ gene in S curevisiae. structural protein initiation codons: effects on regulation of synthesis and biological activity Human gene mutations affecting RNA processing and translation An initiation codon mutation in CDI8 in association with the moderate pbenotype of leukocyte adhesion deficiency Enhanced translational efficiency of a novel transforming growth factor [$3 mRNA in human breast cancer cells Effect of growth hormone on levels of differentially processed IGF-! keywords: aug; aug codon; codon; context; initiation; leader; mrna; non; protein; sequence; structure; translation cache: cord-324021-y1vr1db0.txt plain text: cord-324021-y1vr1db0.txt item: #94 of 119 id: cord-324216-ce3wa889 author: Wang, Zheng title: Resequencing microarray probe design for typing genetically diverse viruses: human rhinoviruses and enteroviruses date: 2008-12-01 words: 5213 flesch: 45 summary: The limited number of HRV sequences available in GenBank during the time of design of RPM-Flu v.30/31 rendered a few of the targets represented on RPM-Flu v.30/31 are shorter than 200 bp. A minimal number of probe sequences (26 for HRV and 13 for HEV), which were potentially capable of detecting all serotypes of HRV and HEV, were determined and implemented on the Resequencing Pathogen Microarray RPM-Flu v.30/31 (Tessarae RPM-Flu). keywords: base; design; hev; hrv; microarray; prototype; resequencing; rpm; sequences; serotypes; strains cache: cord-324216-ce3wa889.txt plain text: cord-324216-ce3wa889.txt item: #95 of 119 id: cord-325043-vqjhiv7p author: Gorbalenya, Alexander E. title: An NTP-binding motif is the most conserved sequence in a highly diverged monophyletic group of proteins involved in positive strand RNA viral replication date: 1989 words: 6807 flesch: 42 summary: In fact, in recent studies, protein sequences were searched for the A consensus alone as the B consensus in its loosest form is obviously too degenerate to be unequivocally recognized, except in a family of diverged proteins (see below). Protein sequences were extracted from the current literature (for references see Table 1 ). keywords: consensus; et al; families; family; motif; ntp; proteins; residues; rna; sequence; viruses cache: cord-325043-vqjhiv7p.txt plain text: cord-325043-vqjhiv7p.txt item: #96 of 119 id: cord-325750-x7jpsnxg author: Mokili, John L title: Metagenomics and future perspectives in virus discovery date: 2012-01-20 words: 8747 flesch: 36 summary: No association of xenotropic murine leukemia virus-related viruses with prostate cancer Reliability and reproducibility issues in DNA microarray measurements Efficient isolation of genes differentially expressed on cellulose by suppression subtractive hybridization in Agaricus bisporus Virus discovery by sequenceindependent genome amplification Suppression subtraction hybridization (SSH) and macroarray techniques reveal differential gene expression profiles in brain of sea bream infected with nodavirus Suppression subtractive hybridization: a versatile method for identifying differentially expressed genes Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma A novel DNA virus (TTV) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology Identification of two flavivirus-like genomes in the GB hepatitis agent STAT1-dependent innate immunity to a Norwalk-like virus Sequence-independent, single-primer amplification (SISPA) of complex DNA populations Metagenomics and the molecular identification of novel viruses Viruses in the faecal microbiota of monozygotic twins and their mothers Hepatitis E virus (HEV): the novel agent responsible for enterically transmitted non-A, non-B hepatitis The isolation and characterization of a Norwalk virus-specific cDNA Identification of a novel astrovirus (astrovirus VA1) associated with an outbreak of acute gastroenteritis Detection of a novel astrovirus in brain tissue of mink suffering from shaking mink syndrome by use of viral metagenomics A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species Laboratory procedures to generate viral metagenomes An excellent compilation of standard operating procedures to perform metagenomic analysis on different types of samples The marine viromes of four oceanic regions Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes Multiple diverse circoviruses infect farm animals and are commonly found in human and chimpanzee feces Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses Viral diversity and dynamics in an infant gut RNA viral community in human feces: prevalence of plant pathogenic viruses Viral communities associated with healthy and bleaching corals Metagenomic analysis of stressed coral holobionts Assembly of viral metagenomes from yellowstone hot springs Using pyrosequencing to shed light on deep mine microbial ecology Microbes and health sackler colloquium: metagenomic detection of phage-encoded platelet-binding factors in the human oral cavity Extraction of high molecular weight genomic DNA from soils and sediments Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification Assessment of whole genome amplification-induced bias through highthroughput, massively parallel whole genome sequencing Whole transcriptome amplification for gene expression profiling and development of molecular archives Single virus genomics: a new tool for virus discovery Flow cytometric detection of viruses DNA sequencing with chainterminating inhibitors Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses Arbovirus detection in insect vectors by rapid, highthroughput pyrosequencing Isolation and characterization of Solenopsis invicta virus 3, a new positive-strand RNA virus infecting the red imported fire ant, Solenopsis invicta A new arenavirus in a cluster of fatal transplant-associated diseases Genomic and phylogenetic characterization of Merino Walk virus, a novel arenavirus isolated in South Africa Parallel tagged sequencing on the 454 platform Targeted high-throughput sequencing of tagged nucleic acid samples The history of pyrosequencing A new method of sequencing DNA The not so universal tree of life or the place of viruses in the living world Reasons to include viruses in the tree of life Viral genomes are part of the phylogenetic tree of life There is no such thing as a tree of life (and of course viruses are out!) In this article, we review virus discovery techniques with a focus on metagenomic approaches that employ high-throughput sequencing technologies to characterize novel viruses. keywords: analysis; approach; characterization; culture; discovery; disease; dna; human; identification; koch; metagenomic; methods; molecular; novel; samples; sequence; sequencing; virus discovery; viruses cache: cord-325750-x7jpsnxg.txt plain text: cord-325750-x7jpsnxg.txt item: #97 of 119 id: cord-325985-xfzhn1n1 author: Jabado, Omar J. title: Comprehensive viral oligonucleotide probe design using conserved protein regions date: 2007-12-13 words: 4266 flesch: 41 summary: All four subtypes were subjected to the same three step design method: identification of conserved regions, extraction of nucleotide probe sequences, and minimization of covering probes. All probe sequences were compared to the non-redundant set of viral sequences by BLASTN (37) . keywords: database; design; method; motif; nucleic; pfam; probe; protein; sequences; viral; virus cache: cord-325985-xfzhn1n1.txt plain text: cord-325985-xfzhn1n1.txt item: #98 of 119 id: cord-326225-crtpzad7 author: Neill, John D. title: Simultaneous rapid sequencing of multiple RNA virus genomes date: 2014-06-01 words: 3807 flesch: 49 summary: These include methodologies based on PCR amplification of viral sequences, both in fragments (Rao et al., 2013) or fulllength genome amplification (Christenbury et al., 2010) . This was modified for amplification of viral sequences from serum to include a step where DNase I was used to first degrade host DNA (Allander et al., 2001) . keywords: dna; genomic; library; rna; sequences; sequencing; viruses cache: cord-326225-crtpzad7.txt plain text: cord-326225-crtpzad7.txt item: #99 of 119 id: cord-328259-3g4klpyg author: Guajardo-Leiva, Sergio title: Metagenomic Insights into the Sewage RNA Virosphere of a Large City date: 2020-09-21 words: 7642 flesch: 43 summary: Viral sequences can also be misannotated to homologous cellular genes [36, 39] , which relies on the low number and diversity of viral sequences in the databases. Viral sequences identified as Partitiviridae-like viruses included in the unclassified RNA viruses ShiM-2016 category in the NCBI taxonomy (~25% abundance; Figure 2B ) and Totiviriade family were also highly abundant in treated and untreated sewage samples from the EU keywords: abundance; database; family; figure; human; ncbi; proteins; rdrp; rna; rotavirus; samples; sequences; sewage; trebal; viral; viruses; wastewater cache: cord-328259-3g4klpyg.txt plain text: cord-328259-3g4klpyg.txt item: #100 of 119 id: cord-328644-odtue60a author: Comandatore, Francesco title: Insurgence and worldwide diffusion of genomic variants in SARS-CoV-2 genomes date: 2020-05-28 words: 6537 flesch: 37 summary: If a functional role for this mutation will be demonstrated, this pattern seems to indicate that different variants might have different fitness when interacting with different host's haplotypes, i.e. in case Asian and European have different haplotypes concerning some of the proteins interacting with the Spike, like for instance Furin. When focusing on single Clades across all macro-regions previously defined, we find a heterogeneous situation with different variants increasing in time in different countries. keywords: coronavirus; et al; frequency; position; present; protein; sars; sequences; spike; time; variants; virus cache: cord-328644-odtue60a.txt plain text: cord-328644-odtue60a.txt item: #101 of 119 id: cord-330067-ujhgb3b0 author: Huang, Yi title: CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes date: 2007-10-02 words: 3010 flesch: 50 summary: During the process of coronavirus gene sequences analysis, we encountered a major problem when coronavirus gene sequences, especially those of orf1ab, were used for blast search against GenBank or any other coronavirus databases. The main goal for setting up CoVDB is to provide a convenient and efficient platform for retrieving batches of coronavirus gene sequences. keywords: coronavirus; covdb; genes; genome; group; proteins; sequence cache: cord-330067-ujhgb3b0.txt plain text: cord-330067-ujhgb3b0.txt item: #102 of 119 id: cord-330312-1pjolkql author: Liu, Y.-T. title: Infectious Disease Genomics date: 2017-01-20 words: 5181 flesch: 36 summary: 16, 17 The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002. In order to understand potential functions of human genes through comparative sequence analyses, they also advised that the HGP must not be restricted to the human genome and should include model organisms including mouse, bacteria, yeast, fruit fly, and worm. keywords: acid; artemisinin; disease; genome; hgp; human; influenza; malaria; parasites; project; sequence; sequencing; vaccine; vector; virus cache: cord-330312-1pjolkql.txt plain text: cord-330312-1pjolkql.txt item: #103 of 119 id: cord-331698-rwow1ydx author: Latorre-Pérez, Adriel title: A lab in the field: applications of real-time, in situ metagenomic sequencing date: 2020-08-20 words: 6734 flesch: 31 summary: ONT metagenomic sequencing results were similar to those obtained with Illumina 16S rRNA sequencing, but a reduced time was achieved using MinION. The nextgeneration sequencing revolution and its impact on genomics Actionable diagnosis of neuroleptospirosis by next-generation sequencing Analysis of culture-dependent versus culture-independent techniques for identification of bacteria in clinically obtained bronchoalveolar lavage fluid Nanopore sequencing as a rapidly deployable Ebola outbreak tool Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella Rapid identification of pathogens from positive blood culture bottles with the MinION nanopore sequencer Rapid nanopore sequencing of plasmids and resistance gene detection in clinical isolates Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis Metagenomic arbovirus detection using MinION nanopore keywords: 16s; analysis; applications; dna; identification; microbial; minion; nanopore; read; samples; sequences; sequencing; situ; technologies; time cache: cord-331698-rwow1ydx.txt plain text: cord-331698-rwow1ydx.txt item: #104 of 119 id: cord-334127-wjf8t8vp author: Brister, J. Rodney title: NCBI Viral Genomes Resource date: 2015-01-28 words: 3865 flesch: 25 summary: Given the difficulty of implementing a purely well annotated representation of viral genome sequences, the viral RefSeq model has evolved into a more flexible approach that includes both reference and representative sequences. The growing cloud of viral genome sequences also poses significant barriers to the maintenance of reference genome records. keywords: data; genome; records; reference; refseq; resource; sequence; species; taxonomy; virus; viruses cache: cord-334127-wjf8t8vp.txt plain text: cord-334127-wjf8t8vp.txt item: #105 of 119 id: cord-334394-qgyzk7th author: Edgar, Robert C. title: Petabase-scale sequence alignment catalyses viral discovery date: 2020-08-10 words: 8139 flesch: 49 summary: Innovative fields such as high-throughput functional viromics [39] leverage these broad and rapidly growing collections of viral sequences, and can inform evidence-based policies responding to emerging pandemics [40, 41] . Accurate annotation of CoV genomes is challenging due to ribosomal frameshifts and polyproteins which are cleaved into maturation proteins [56] , and thus previously-annotated viral genomes offer a guide to accurate gene-calls and protein functional predictions. keywords: alignment; annotation; assembly; contigs; cov; coverage; data; family; figure; genome; identity; rdrp; reads; reference; rna; sequence; sequencing; serratus; sra; study; tree; virus cache: cord-334394-qgyzk7th.txt plain text: cord-334394-qgyzk7th.txt item: #106 of 119 id: cord-338207-60vrlrim author: Lefkowitz, E.J. title: Virus Databases date: 2008-07-30 words: 7958 flesch: 45 summary: Extensible markup language (XML) is another widely used format for storing database information. The original data may be faulty: using sequence data as one example, nucleotides in a DNA sequence may have been misread or miscalled, or someone may even have mistyped the sequence. keywords: biological; data; database; genbank; gene; information; ncbi; protein; record; sequence; table; viral; virus; viruses cache: cord-338207-60vrlrim.txt plain text: cord-338207-60vrlrim.txt item: #107 of 119 id: cord-339209-oe8onyr9 author: Vasilakis, Nikos title: Mesoniviruses are mosquito-specific viruses with extensive geographic distribution and host range date: 2014-05-20 words: 5821 flesch: 41 summary: Mesoniviridae: a proposed new family in the order Nidovirales formed by a single species of mosquito-borne viruses Examining landscape factors influencing relative distribution of mosquito genera and frequency of virus infection Discovery of the first insect nidovirus, a missing evolutionary link in the emergence of the largest RNA virus genomes An insect nidovirus emerging from a primary tropical rainforest Identification and characterization of genetically divergent members of the newly established family mesoniviridae Molecular biology and pathogenesis of roniviruses A new nidovirus (NamDinh virus NDiV): its ultrastructural characterization in the C6/36 mosquito cell line A new species of mesonivirus from the northern territory, australia Supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy Rtips: fast and accurate tools for RNA 2D structure prediction using integer programming A Wolbachia symbiont in Aedes aegypti limits infection with dengue, Chikungunya, and Plasmodium The relative importance of innate immune priming in Wolbachia-mediated dengue interference The native Wolbachia endosymbionts of Drosophila melanogaster and Culex quinquefasciatus increase host resistance to West Nile virus infection Negevirus: a proposed new taxon of insect-specific viruses with wide geographic distribution The footprint of genome architecture in the largest genome expansion in RNA viruses Isolation of a Singh's Aedes albopictus cell clone sensitive to dengue and Chikungunya viruses SMART 7: recent updates to the protein domain annotation resource SMART, a simple modular architecture research tool: identification of signaling domains MUSCLE: multiple sequence alignment with high accuracy and high throughput New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing SSE: a nucleotide and amino acid sequence analysis platform Mesoniviruses are mosquito-specific viruses with extensive geographic distribution and host range Additional file 5: Figure S5 . The compiled sequences had their relationship to other viruses determined by a BLASTX search. keywords: alignment; analysis; conserved; domains; figure; genome; isolates; mesoniviruses; ndiv; orf1a; region; sequence; species; structure cache: cord-339209-oe8onyr9.txt plain text: cord-339209-oe8onyr9.txt item: #108 of 119 id: cord-339915-8j04y50s author: Deng, Wei title: DV-Curve Representation of Protein Sequences and Its Application date: 2014-05-08 words: 2960 flesch: 45 summary: A novel 2-D graphical representation of DNA sequences of low degeneracy On the uniqueness of quantitative DNA difference descriptions in 2D graphical representation models Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation A class of new 2-D graphical represent ation of DNA sequences and their application Graphical representations of DNA as 2-D map H-L curve: a novel 2D graphical representation for DNA sequences DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences Analysis of similarity/dissimilarity of DNA sequences based on chaos game representation A 3D graphical representation of DNA sequences and its application A group of 3D graphical representation of DNA sequences based on dual nucleotides New graphical representation of a DNA sequence based on the ordered dinucleotides and its application to sequence analysis Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation Novel 4D numerical representation of DNA sequences On the similarity of DNA primary sequences based on 5-D representation Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases Unique graphical representation of protein sequences based on nucleotide triplet codons A 2-D graphical representation of protein sequences based on nucleotide triplet codons Protein-based phylogenetic analysis by using hydropathy profile of amino acids 2-D Graphical representation of proteins based on physico-chemical properties of amino acids 2-D graphical representation of protein sequences and its application to coronavirus phylogeny New 3-D graphical representation of protein sequences and its application A 2D graphical representation of protein sequence and its numerical characterization Similarity/dissimilarity studies of protein sequences based on a new 2d graphical representation New technique: protein sequence analysis based on hydropathy profile of amino acids 3D graphical representation of protein sequences and their statistical characterization Similarity/dissimilarity analysis of protein sequences using the spatial median as a descriptor Modeling study on the validity of a possibly simplified representation of proteins On 3-D graphical representation of DNA primary sequences and their numerical characterization Novel 2-D graphical representation of DNA sequences and their numerical characterization Compact 2-D graphical representation of DNA Application of 2-D graphical representation of DNA sequence On the complexity of multiple sequence alignment A probabilistic measure for alignment-free sequence comparison An information-based sequence distance and its application to whole mitochondrial genome phylogeny A new sequence distance measure for phylogenetic tree construction A weighted least-squares approach for inferring phylogenies from incomplete distance matrices A novel coronavirus associated with severe acute respiratory syndrome The genome sequence of the sars-associated coronavirus The Principles and Practice of Numerical Classification Characterization of a novel coronavirus associated with severe acute respiratory syndrome Severe acute respiratorysyndrome coronavirus-like virus in Chinese horseshoe bats The authors thank to all the anonymous reviewers for their valuable suggestions and support. Comput Math Methods Med DOI: 10.1155/2014/203871 sha: doc_id: 339915 cord_uid: 8j04y50s Based on the detailed hydrophobic-hydrophilic(HP) model of amino acids, we propose dual-vector curve (DV-curve) representation of protein sequences, which uses two vectors to represent one alphabet of protein sequences. keywords: curve; dna; protein; representation; sequences cache: cord-339915-8j04y50s.txt plain text: cord-339915-8j04y50s.txt item: #109 of 119 id: cord-340907-j9i1wlak author: Zarai, Yoram title: Evolutionary selection against short nucleotide sequences in viruses and their related hosts date: 2020-04-27 words: 8168 flesch: 41 summary: The virus and host coding sequences and association information was retrieved from a published database. We provide various novel discoveries that may shed light on the evolution of viral DNA sequences and on the virus co-evolution with its respective hosts. keywords: analysis; codon; genes; genome; host; nucleotide; number; restriction; selection; sequences; size; viruses; zikv cache: cord-340907-j9i1wlak.txt plain text: cord-340907-j9i1wlak.txt item: #110 of 119 id: cord-341564-fvuwick5 author: Qi, Zhao-Hui title: Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application date: 2018-06-12 words: 2660 flesch: 50 summary: Novel spectral representation of RNA secondary structure without loss of information Milestones in graphical bioinformatics Four-component spectral representation of DNA sequences Graphical and numerical representations of DNA sequences: statistical aspects of similarity 2D-dynamic representation of DNA sequences Spectral-dynamic representation of DNA sequences 3D-dynamic representation of DNA sequences A group of 3D graphical representation of DNA sequences based on dual nucleotides WITHDRAWN: 2-D graphical representation of proteins based on physico-chemical properties of amino acids ADLD: a novel graphical representation of protein sequences and its application Protein map: an alignment-free sequence comparison method based on various properties of amino acids An efficient numerical method for protein sequences similarity analysis based on a new two-dimensional graphical representation Graphical representation of proteins as four-color maps and their numerical characterization A protein mapping method based on physicochemical properties and dimension reduction The graphical representation of protein sequences based on the physicochemical properties and its applications F-Curve, a graphical representation of protein sequences for similarity analysis based on physicochemical properties of amino acids Analysis of similarity/dissimilarity of protein sequences The genetic code and error transmission In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. keywords: amino; method; protein; representation; sequences; similarity cache: cord-341564-fvuwick5.txt plain text: cord-341564-fvuwick5.txt item: #111 of 119 id: cord-341879-vubszdp2 author: Li, Lucy M title: Genomic analysis of emerging pathogens: methods, application and future trends date: 2014-11-22 words: 5030 flesch: 31 summary: Because of the simplistic assumptions of population genetics models, the population size inferred using coalescentbased methods cannot be directly interpreted as pathogen population size (prevalence of infection). Although the two approaches are methodologically different, both aim to reconstruct pathogen population history and produce estimates of epidemiological parameters, such as the reproductive number (R 0 ). keywords: analysis; coalescent; data; disease; models; pathogen; population; sequences; time; transmission cache: cord-341879-vubszdp2.txt plain text: cord-341879-vubszdp2.txt item: #112 of 119 id: cord-342785-55r01n0x author: Lemmon, Gordon H title: Predicting the sensitivity and specificity of published real-time PCR assays date: 2008-09-25 words: 4319 flesch: 46 summary: GL found real time PCR signatures in the literature, wrote Perl scripts, and performed the analysis of published signatures. It has been estimated that a minimum of 3-4 genomes are needed in order to computationally design TaqMan PCR signatures likely to detect most strains, with those isolates chosen for sequencing that have been selected to span gradients of geographic, phenotypic, and temporal variation [19] . keywords: assay; detection; pcr; primer; probe; sensitivity; sequences; signatures; time; virus cache: cord-342785-55r01n0x.txt plain text: cord-342785-55r01n0x.txt item: #113 of 119 id: cord-343863-q1y8uscj author: Whitney, Joe title: Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches date: 2005-02-08 words: 3464 flesch: 58 summary: It allows the researcher to ask the question: what new sequences match my sequences since the last time I searched? ReHAB is designed to handle large numbers of query sequences, such as whole genomes or sets of genomes. keywords: blast; database; hits; query; rehab; results; sequences cache: cord-343863-q1y8uscj.txt plain text: cord-343863-q1y8uscj.txt item: #114 of 119 id: cord-344782-ond1ziu5 author: Zhang, Jing title: Identification of a novel nidovirus as a potential cause of large scale mortalities in the endangered Bellinger River snapping turtle (Myuchelys georgesi) date: 2018-10-24 words: 6005 flesch: 45 summary: Similarity to other viruses for each of the ORFs and their predicted amino acid sequences were determined by searches using BLASTn and BLASTp [13] algorithms through the NCBI server (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Ball Python Nidovirus: a Candidate Etiologic Agent for Severe Respiratory Disease in Python regius Identification of a novel nidovirus in an outbreak of fatal respiratory disease in ball pythons (Python regius) Novel divergent nidovirus in a python with pneumonia Nidovirus-Associated Proliferative Pneumonia in the Green Tree Python (Morelia viridis) Discovery and partial genomic characterisation of a novel nidovirus associated with respiratory disease in wild shingleback lizards (Tiliqua rugosa) Redefining the invertebrate RNA virosphere The evolutionary history of vertebrate RNA viruses Programmed translational frameshifting Ribosomal frameshifting on viral RNAs The primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus MHV-A59 a highly conserved polymerase is expressed by an efficient ribosomal frameshifting mechanism An RNA Pseudoknot in the 3' end of the Arterivirus genome has a critical role in regulating viral RNA synthesis Changes to taxonomy and the international code of virus classification and Nomenclature ratified by the international committee on taxonomy of viruses Sequence-based identification of microbial pathogens: a reconsideration of Koch's postulates Molecular comparison of isolates of an emerging fish pathogen, Koi herpesvirus, and the effect of water temperature on mortality of experimentally infected Koi Is horizontal transmission of the ostreid herpesvirus OsHV-1 in Crassostrea gigas affected by unselected or selected survival status in adults to juveniles? keywords: acid; animals; disease; georgesi; min; nidovirus; pcr; python; river; rna; samples; sequence; species; tissues; turtle; virus cache: cord-344782-ond1ziu5.txt plain text: cord-344782-ond1ziu5.txt item: #115 of 119 id: cord-345552-h6fwi0qn author: Li, Q.-G. title: Hydropathic characteristics of adenovirus hexons date: 1997-07-01 words: 3524 flesch: 48 summary: Every hexon DNA sequence was translated to protein sequence by using program EditSeq-Translation. Here, we report the hydropathy analysis of 14 adenovirus hexon sequences predicted from a newly determined Ad7 hexon DNA sequence and thirteen published hexon sequences of Ad2, Ad3, Ad4, Ad5, Ad12, Ad16, Ad40, Ad41, Ad48, Bav3, Mav1, Fav1 and Fav10. keywords: acid; adenovirus; amino; dna; hexon; regions; sequence; type cache: cord-345552-h6fwi0qn.txt plain text: cord-345552-h6fwi0qn.txt item: #116 of 119 id: cord-348427-worgd0xu author: Hatcher, Eneida L. title: Virus Variation Resource – improved response to emergent viral outbreaks date: 2017-01-04 words: 5555 flesch: 43 summary: When searching protein sequences, selecting 'Full-length sequences only' filter, limits retrieved sequences to those with a complete coding region as determined to the relevant reference. Here, protein reference sequences are aligned with potential translations of the query sequence. keywords: annotation; data; metadata; nucleotide; protein; records; resource; search; sequences; terms; variation; virus; viruses cache: cord-348427-worgd0xu.txt plain text: cord-348427-worgd0xu.txt item: #117 of 119 id: cord-353290-1wi1dhv6 author: Kustin, Talia title: Biased mutation and selection in RNA viruses date: 2020-09-28 words: 7615 flesch: 42 summary: One major challenge in tackling RNA viruses is the fact they are extremely genetically diverse. RNA viruses are an extremely diverse collection of entities, spanning a diverse range of hosts, morphologies, genome organizations, and genetic composition. keywords: bias; branches; codon; fig; genomes; host; mutation; nucleotide; rna; selection; sequences; usage; viruses cache: cord-353290-1wi1dhv6.txt plain text: cord-353290-1wi1dhv6.txt item: #118 of 119 id: cord-354465-5nqrrnqr author: Haslinger, Christian title: RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties date: 1999 words: 10375 flesch: 61 summary: A new principle of RNA folding based on pseudoknotting Random induced subgraphs of generalized n-cubes Bio-molecular shapes and algebraic structures Generic properties of combinatory maps: Neural networks of RNA secondary structures Petersen family minors Sachs' linkless embedding conjecture Linear trees and RNA secondary structure How to search for RNA structures. Combinatorial aspects of RNA secondary structures have been studied in detail by Waterman and co-workers (Stein and Waterman, 1978; Waterman, 1978; Waterman and Smith, 1978a, b; Penner and Waterman, 1993; keywords: base; diagram; energy; graph; knots; neutral; number; pseudo; rna; sequences; structures; vertices cache: cord-354465-5nqrrnqr.txt plain text: cord-354465-5nqrrnqr.txt item: #119 of 119 id: cord-355075-ieb35upi author: Papenfuss, Anthony T title: The immune gene repertoire of an important viral reservoir, the Australian black flying fox date: 2012-06-20 words: 8959 flesch: 48 summary: The GO classification demonstrates that a diverse range of genes were identified in each of our two datasets providing a broad survey of bat genes. We have also begun to identify some of the genes involved in immune responses in this species and carry out functional studies in bat cells keywords: alecto; antiviral; bat; bats; cells; class; contigs; datasets; genes; immune; mammals; mhc; protein; receptors; sequences; species; thymus; transcriptome; transcripts; viruses cache: cord-355075-ieb35upi.txt plain text: cord-355075-ieb35upi.txt