item: #1 of 119
          id: cord-000257-ampip7od
      author: Bagowski, Christoph P
       title: The Nature of Protein Domain Evolution: Shaping the Interaction Network
        date: 2010-08-17
       words: 4681
      flesch: 35
     summary: In this review, we aim to describe the basic concepts of protein domain evolution and illustrate recent developments in molecular evolution that have provided valuable new insights in the field of comparative genomics and protein interaction networks. This approach thus primarily focuses on the similarity and differences of the orthologous genes within network, and is therefore ideally suited for the study of protein domain evolution and has already revealed that species-specific parts Fig.
    keywords: analysis; binding; domains; evolution; expression; gene; genome; interaction; network; protein; sequence
       cache: cord-000257-ampip7od.txt
  plain text: cord-000257-ampip7od.txt

        item: #2 of 119
          id: cord-000473-jpow6iw1
      author: Astrovskaya, Irina
       title: Inferring viral quasispecies spectra from 454 pyrosequencing reads
        date: 2011-07-28
       words: 5369
      flesch: 49
     summary: The software provided by instrument manufacturers were originally designed to assemble all reads into a single genome sequence, and cannot be used for reconstructing quasispecies sequences. Since the number of different st-paths is exponential, we wish to generate a set of paths that have high probability to correspond to real quasispecies sequences.
    keywords: candidate; mismatches; quasispecies; reads; sequences; sequencing; shorah; vispa
       cache: cord-000473-jpow6iw1.txt
  plain text: cord-000473-jpow6iw1.txt

        item: #3 of 119
          id: cord-000642-mkwpuav6
      author: Moreira, Rebeca
       title: Transcriptomics of In Vitro Immune-Stimulated Hemocytes from the Manila Clam Ruditapes philippinarum Using High-Throughput Sequencing
        date: 2012-04-19
       words: 6864
      flesch: 41
     summary: Hits to R. philippinarum sequences were represented in a Venn diagram. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum.
    keywords: analysis; bivalves; clam; contigs; expression; factor; genes; immune; philippinarum; proteins; recognition; response; ruditapes; sequences; species; transcriptome
       cache: cord-000642-mkwpuav6.txt
  plain text: cord-000642-mkwpuav6.txt

        item: #4 of 119
          id: cord-001340-kqcx7lrq
      author: Ladner, Jason T.
       title: Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing
        date: 2014-06-17
       words: 2513
      flesch: 34
     summary: Despite the small sizes of viral genomes, complications related to limited RNA quantities, host contamination, and secondary structure mean that it is often not time-or cost-effective to finish every genome, and given the intended use, finishing may be unnecessary (5) . One of the most common and important applications for viral genomes is in the study of viral epidemiology, which encompasses our understanding of the patterns, causes, and effects of disease.
    keywords: characterization; coverage; genome; sequences; sequencing; viruses
       cache: cord-001340-kqcx7lrq.txt
  plain text: cord-001340-kqcx7lrq.txt

        item: #5 of 119
          id: cord-001537-i34vmfpp
      author: Lima, Francisco Esmaile de Sales
       title: Genomic Characterization of Novel Circular ssDNA Viruses from Insectivorous Bats in Southern Brazil
        date: 2015-02-17
       words: 3883
      flesch: 46
     summary: Sequence analyses were performed with the BLASTX software (http://www.ncbi.nlm.nih.gov/blast/). Pan-reactive primers were used targeting the conserved rep region of circoviruses and cycloviruses to screen DNA bat fecal samples.
    keywords: batcv; bats; cap; circoviruses; cyclovirus; dna; genomes; rep; samples; sequences
       cache: cord-001537-i34vmfpp.txt
  plain text: cord-001537-i34vmfpp.txt

        item: #6 of 119
          id: cord-001786-ybd8hi8y
      author: Dutilh, Bas E
       title: Metagenomic ventures into outer sequence space
        date: 2014-12-15
       words: 2194
      flesch: 37
     summary: However, it remains an open question, what is the actual size of biological sequence space? However, it remains an open question, what is the actual size of biological sequence space?
    keywords: metagenomics; sequence; sequencing; space; unknowns
       cache: cord-001786-ybd8hi8y.txt
  plain text: cord-001786-ybd8hi8y.txt

        item: #7 of 119
          id: cord-001835-0s7ok4uw
      author: None
       title: Abstracts of the 29th Annual Symposium of The Protein Society
        date: 2015-10-01
       words: 138771
      flesch: 38
     summary: In conclusion, the analysis of hydropathic environments strongly suggests that the orientation of a residue in a three-dimensional structure is a direct consequence of its hydropathic environment, which leads us to propose a new paradigm, interaction homology, as a key factor in protein structure. In computer simulation modeling of protein structure in a solvent medium, explicit, implicit, effectivemedium, approaches are often adopted to incorporate the effects of solvation.
    keywords: acid; activation; activity; addition; affinity; amino; amyloid; analysis; antibodies; antibody; antigen; approach; assay; assembly; associated; bacterial; binding; biology; bonds; cancer; catalytic; cell; cellular; chain; changes; characterization; chemical; chemistry; coli; complex; computational; concentration; conditions; conformation; conserved; control; core; cross; crystal; crystal structure; data; department; determine; development; dimer; disease; disordered; disordered proteins; dna; docking; domain; drug; effect; energy; enzyme; essential; experiments; expression; factors; family; fluorescence; fluorescent protein; formation; forms; fragments; free; functions; gene; group; helix; human; hydrogen; hydrophobic; important; increase; inhibitors; institute; key; kinetic; level; ligand; light; like; lipid; loop; low; major; mass; mechanism; membrane protein; method; model; modification; molecular; molecules; motif; mutant; mutations; n protein; native; nature; new; nmr; non; novel; number; oligomers; order; pathways; peptide; potential; prediction; presence; present; process; processes; properties; protease; protein; protein aggregation; protein association; protein complexes; protein concentration; protein data; protein degradation; protein design; protein domain; protein dynamics; protein engineering; protein evolution; protein expression; protein families; protein folding; protein function; protein interactions; protein interface; protein kinases; protein molecules; protein production; protein sequences; protein stability; protein structure; protein surface; protein tyrosine; provide; range; ray; reaction; receptor; recognition; recombinant; region; regulation; research; residues; response; results; rna; role; self; sequence; set; shows; signal; signaling; simulations; site; size; solution; species; specific; specificity; spectroscopy; stability; state; step; structural; studies; study; substrate; subunit; surface; synthetic; system; target; target protein; tau protein; techniques; temperature; terminal; time; transcription; transfer; transition; transmembrane; type protein; understanding; unfolding; university; use; variants; virus; vitro; wild; work; yeast
       cache: cord-001835-0s7ok4uw.txt
  plain text: cord-001835-0s7ok4uw.txt

        item: #8 of 119
          id: cord-001974-wjf3c7a7
      author: Friis-Nielsen, Jens
       title: Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers
        date: 2016-02-19
       words: 5776
      flesch: 43
     summary: Sequence clusters that have been described in detail throughout the manuscript have been included as supplementary files. A grouping based on taxonomy, or a more data-driven approach that cluster sequence groups based on the associated datasets as seen in Figure 2 , could be included as another iteration to properly strengthen the statistical associations.
    keywords: associations; cancer; clustering; clusters; contigs; data; features; human; parameters; samples; sequences; sequencing; species; table; virus
       cache: cord-001974-wjf3c7a7.txt
  plain text: cord-001974-wjf3c7a7.txt

        item: #9 of 119
          id: cord-002473-2kpxhzbe
      author: Das, Jayanta Kumar
       title: Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach
        date: 2017-03-31
       words: 4616
      flesch: 56
     summary: Current protocols in molecular biology Phylogenetic analysis of protein sequences based on conditional LZ complexity Analyzing and synthesizing phylogenies using tree alignment graphs A probabilistic measure for alignment-free sequence comparison Simplification of protein sequence and alignment-free sequence analysis Phylogenies and the comparative method Progressive sequence alignment as a prerequisitetto correct phylogenetic trees Graph theory with applications to engineering and computer science Protein flexibility predictions using graph theory Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features Use of information discrepancy measure to compare protein secondary structures 2-D graphical representation of protein sequences and its application to coronavirus phylogeny A 2D graphical representation of protein sequence and its numerical characterization Similarity/dissimilarity analysis of protein sequences based on a new spectrum-like graphical representation Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted.
    keywords: acids; amino; chemical; graph; group; ppca; ppcd; protein; sequence
       cache: cord-002473-2kpxhzbe.txt
  plain text: cord-002473-2kpxhzbe.txt

        item: #10 of 119
          id: cord-003316-r5te5xob
      author: Balloux, Francois
       title: From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic
        date: 2018-12-17
       words: 7342
      flesch: 29
     summary: However, a micro-costing analysis covering laboratory and personnel costs estimated the cost of clinical WGS to £481 per M. tuberculosis isolate versus £518 applying standard methods, representing relatively marginal cost savings but with significant time savings [63] . Somewhat ironically, the extremely rich information of WGS data, with every genome being unique, generates problems of its own.
    keywords: amr; analysis; costs; data; diagnostics; example; genome; microbiology; outbreak; resistance; sequence; sequencing; time; transmission; virulence; wgs
       cache: cord-003316-r5te5xob.txt
  plain text: cord-003316-r5te5xob.txt

        item: #11 of 119
          id: cord-004862-yv76yvy5
      author: Demers, G. William
       title: The L1 family of long interspersed repetitive DNA in rabbits: Sequence, copy number, conserved open reading frames, and similarity to keratin
        date: 1989
       words: 6680
      flesch: 55
     summary: In this paper, the rabbit L1 repeats are characterized more thoroughly, and the similarities and differences of L1 sequences between species are explored further. Therefore, the overlap between reading frames 1 and 2 are conserved in mouse Lls, but the overlaps are not seen in the rabbit and human L1 sequences.
    keywords: dna; end; et al; fig; orf-1; rabbit; region; repeats; sequence
       cache: cord-004862-yv76yvy5.txt
  plain text: cord-004862-yv76yvy5.txt

        item: #12 of 119
          id: cord-004879-pgyzluwp
      author: None
       title: Programmed cell death
        date: 1994
       words: 81833
      flesch: 47
     summary: 8cl-2(z is a mitochondrial or perinuclear-associated oncoprotein that prolongs the life span of a variety of cell types by interfering with programmed cell death. Single and repetitive uptake and release of CPZ were measured in each cell type after individual exposure or exposure in any combination of cell types: In 2 hour competitive uptake studies fibreblasts reached 1.7 and 2.6 times the concentrations of C6-and ROC-cells, :respectively.
    keywords: acid; activation; activity; addition; adult; amino; analysis; animals; antibodies; binding; brain; calcium; cdna; cell lines; cells; changes; cloned; complex; concentrations; conditions; contrast; control; cultures; current; data; days; decrease; development; different; differentiation; dna; domain; early; effects; end; enzyme; epithelial; experiments; expression; extracts; factor; family; fold; form; formation; function; fusion; gene; gene expression; growth; homology; hormone; human; increase; induction; infected; inhibition; institut; interaction; intracellular; kda; kinase; levels; major; mammalian; mechanisms; medium; membrane; mice; molecular; mouse; mrna; muscle; mutant; nerve; neuronal; neurons; new; non; nuclear; nucleus; number; order; pathway; phosphorylation; play; positive; potential; presence; present; process; production; promoter; properties; protein; protein expression; rat; rate; rats; reaction; receptor; recombinant; recombination; region; regulation; release; replication; response; results; rna; role; sequence; signal; sites; species; specific; stage; stimulation; structure; studies; study; subunit; surface; synthesis; system; t cells; target; terminal; time; tissue; tnf; transcription; treatment; tumor; type; university; virus; vitro; vivo; yeast
       cache: cord-004879-pgyzluwp.txt
  plain text: cord-004879-pgyzluwp.txt

        item: #13 of 119
          id: cord-005060-n901y2d4
      author: ZHANG, Feiyun
       title: Complete Nucleotide Sequence of Ryegrass Mottle Virus : A New Species of the Genus Sobemovirus
        date: 2001
       words: 2606
      flesch: 53
     summary: Sobemovirus genome appears to encode a serine protease related to cysteine proteases of picornaviruses Genus sobemovirus Signals for ribosomal frameshifting in the rous sarcoma virus gag-pol region Characterization of ribosomal frameshift in HIV-1 gag-pol expression The putative replicase of the cocksfoot mottle sobemovirus is translated as a part of the polyprotein by -1 ribosomal frameshift Sequence and organization of barley yellow dwarf virus genomic RNA Luteovirus gene expression genome characterization of rice yellow mottle virus RNA Nucleotide sequence of the bean strain of southern bean mosaic virus Identification of four conserved motifs among the RNA-dependent polymerases encoding elements Messenger RNA for the coat protein of southern bean mosaic virus Nucleotide sequence of RNA from the sobemovirus found in infected cocksfoot shows a luteovirus-like arrangement of the putative replicase and protease genes Translation of southern bean mosaic virus RNA in wheat embryo and rabbit reticulocyte extracts Complementarity between the 5'-and 3'-terminal sequences of rice stripe virus RNAs Identification of genes encoding for the cocksfoot mottle virus proteins Cocksfoot mottle virus in Japan Ryegrass mottle virus, a new virus from Lolium multiflorum in Japan Nucleotide sequence of RNA 1, the largest genomic segment of rice stripe virus, the prototype of the tenuivirus The genome-linked protein (VPg) of southern bean mosaic virus is encoded by the ORF2 Guidelines to the demarcation of virus species Sequence and organization of southern bean mosaic virus genomic RNA Evolution of RNA viruses Analysis of the in vitro translation products of RGMoV RNA suggests that the 68 kDa protein may represent a fusion protein of ORF 2-ORF 3 produced by frameshifting.
    keywords: amino; kda; orf; protein; rgmov; rna; sequence; virus
       cache: cord-005060-n901y2d4.txt
  plain text: cord-005060-n901y2d4.txt

        item: #14 of 119
          id: cord-010161-bcuec2fz
      author: Matson, David O.
       title: IV, 6. Calicivirus RNA recombination
        date: 2004-09-14
       words: 3338
      flesch: 37
     summary: It is clear that such clades are related to differences in capsid gene sequences; sequence differences are less marked in the RNA polymerase gene: when RNA polymerase region sequences are analyzed in phylogenetic analyses, statistically significant differences similar to those observed among capsid gene sequences do not occur . Models for RNA virus recombination have utilized two terminologies to describe the degree that features of the donor and acceptor templates are shared: homologous, aberrant homologous, and non-homologous types (Lai and Cavanagh, 1997) or sequence similarity-essential, similarity-assisted, and similarity-nonessential (Nagy and Simon, 1997) .
    keywords: capsid; cvs; recombination; rna; sequence; strains
       cache: cord-010161-bcuec2fz.txt
  plain text: cord-010161-bcuec2fz.txt

        item: #15 of 119
          id: cord-010260-8lnpujip
      author: Anthonsen, Henrik W.
       title: The blind watchmaker and rational protein engineering
        date: 1994-08-31
       words: 17358
      flesch: 42
     summary: A practical approach The modelling of electrostatic interactions in the function of globular proteins Electrostatic interactions in globular proteins: Calculation of the pH dependence of the redox potential of cytochrome C55 I Extracting information on folding from the amino acid sequence: Consensus regions with preferred conformation in homologous proteins Prediction of protein secondary structure at better than 70% accuracy Secondary structure prediction of all-helical proteins in two states PHD -An automatic mail server for protein secondary structure prediction Progress in protein structure prediction? Predicting protein secondary structure with a nearest-neighbor algorithm Database of homologyderived protein structures and the structural meaning of sequence alignment An winexpensive, versatile sample illuminator for photo-CIDNP on any NMR spectrometer Pancreatic lipases: Evolutionary intermediates in a positional change of catalytic carboxylates? key: cord-010260-8lnpujip authors: Anthonsen, Henrik W.; Baptista, António; Drabløs, Finn; Martel, Paulo; Petersen, Steffen B. title: The blind watchmaker and rational protein engineering date: 1994-08-31 journal: J Biotechnol DOI: 10.1016/0168-1656(94)90152-x sha: doc_id: 10260 cord_uid: 8lnpujip In the present review some scientific areas of key importance for protein engineering are discussed, such as problems involved in deducting protein sequence from DNA sequence (due to posttranscriptional editing, splicing and posttranslational modifications), modelling of protein structures by homology, NMR of large proteins (including probing the molecular surface with relaxation agents), simulation of protein structures by molecular dynamics and simulation of electrostatic effects in proteins (including pH-dependent effects).
    keywords: acid; alignment; amino; approach; cases; charge; data; engineering; et al; fig; gene; information; interactions; methods; modelling; nmr; number; potential; prediction; protein; protein engineering; protein sequence; protein structure; relaxation; residues; resonance; sequence; site; solution; solvent; structure; use
       cache: cord-010260-8lnpujip.txt
  plain text: cord-010260-8lnpujip.txt

        item: #16 of 119
          id: cord-010273-0c56x9f5
      author: Simmonds, Peter
       title: Virology of hepatitis C virus
        date: 2001-10-10
       words: 7904
      flesch: 30
     summary: Homology of the predominant genotype with the prototype American strain Detection of three types of hepatitis C virus in blood donors: Investigation of type-specific differences in serological reactivity and rate of alanine aminotransferase abnormalities Identification of hepatitis C viruses with a nonconserved sequence of the 5' untranslated region Sequence analysis of the 5' noncoding region of hepatitis C virus At least five related, but distinct, hepatitis C viral genotypes exist Typing of hepatitis C virus isolates and new subtypes using a line probe assay Sequence analysis of the 5' untranslated region in isolates of at least four genotypes of hepatitis C virus in the Netherlands Use of the 5' non-coding region for genotyping hepatitis C virus Genotypes of hepatitis C virus in Italian patients with chronic hepatitis C Heterogeneity of hepatitis C virus genotypes in France Genotypic analysis of hepatitis C virus in American patients Hepatitis C virus infection in Egyptian volunteer blood donors in Riyadh Risk factors associated with a high seroprevalence of hepatitis C virus infection in Egyptian blood donors High HCV prevalence in Egyptian blood donors Sequence variability in the 5' non coding region of hepatitis C virus: Identification of a new virus type and restrictions on sequence diversity Geographical distribution of hepatitis C virus genotypes in blood donors: An international collaborative survey New genotype of hepatitis C virus in South-Africa Typing of hepatitis C virus (HCV) genomes by restriction fragment length polymorphisms Distribution of plural HCV types in Japan Clinical backgrounds of the patients having different types of hepatitis C virus genomes Genomic typing of hepatitis C viruses present in China HCV genotypes in China HCV genotypes in different countries Differences in the hepatitis C virus genotypes in different countries Prevalence, genotypes, and an isolate (HC-C2) of hepatitis C virus in Chinese patients with liver disease Imported hepatitis C virus genotypes in Japanese hemophiliacs Genotypic subtyping of hepatitis C virus Survey of major genotypes and subtypes of hepatitis C virus using restriction fragment length polymorphism of sequences amplified from the 5' non-coding region A new type of hepatitis C virus in patients in Thailand Hepatitis C virus variants from Nepal with novel genotypes and their classification into the third major group Hepatitis C virus variants from Vietnam are classifiable into the seventh, eighth, and ninth major genetic groups Prediction of response to interferon treatment of chronic hepatitis C HCV genotypes in chronic hepatitis C and response to interferon Detection of hepatitis C virus by polymerase chain reaction and response to interferon-alpha therapy: hepatitis C virus isolates and PCR primers for specific detection Application of six hepatitis C virus genotyping systems to sera from chronic hepatitis C patients in the United States Use of NS-4 peptides to identify typespecific antibody to hepatitis C virus genotypes 1, 2, 3, 4, 5 and 6 Characterization of hypervariable regions in the putative envelope protein of hepatitis C virus Evidence for immune selection of hepatitis C virus (HCV) putative envelope glycoprotein variants:
    keywords: c virus; cell; cleavage; genome; genotypes; hcv; hepatitis; infection; patients; proteins; region; replication; rna; sequence; virus
       cache: cord-010273-0c56x9f5.txt
  plain text: cord-010273-0c56x9f5.txt

        item: #17 of 119
          id: cord-010499-yefxrj30
      author: Yelverton, Elizabeth
       title: The function of a ribosomal frameshifting signal from human immunodeficiency virus‐1 in Escherichia coli
        date: 2006-10-27
       words: 5905
      flesch: 52
     summary: Protein sequence analysis demonstrated the occurrence of two closeiy related frameshift mechanisms. Protein sequence analysis of the product indicates the occurrence of two siightiy different mechanisms of shifting.
    keywords: amino; codon; cycle; frameshifting; gallant; leucine; limitation; protein; reading; sequence; site; trna
       cache: cord-010499-yefxrj30.txt
  plain text: cord-010499-yefxrj30.txt

        item: #18 of 119
          id: cord-011565-8ncgldaq
      author: Elworth, R A Leo
       title: To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
        date: 2020-06-04
       words: 12966
      flesch: 47
     summary: algorithm Sliding hyperloglog: estimating cardinality in a data stream over a sliding window Using cascading Bloom filters to improve the memory usage for de Brujin graphs Fast lossless compression via cascading Bloom filters Improving Bloom filter performance on sequence data using k-mer Bloom filters An improved construction for counting Bloom filters Spectral Bloom filters Diversified RACE sampling on data streams applied to metagenomic sequence analysis Repeated And Merged Bloom Filter for Multiple Set Membership Testing (MSMT) in sub-linear time Sub-linear sequence search via a Repeated And Merged Bloom Filter (RAMBO): indexing 170 TB data in 14 hours Efficient generation of transcriptomic profiles by random composite measurements The restricted isometry property and its implications for compressed sensing A simple proof of the restricted isometry property for random matrices Adaptive compressed sensing MRI with unsupervised learning Insense: incoherent sensor selection for sparse signals A data-driven and distributed approach to sparse signal representation and recovery The sparse recovery autoencoder Learned D-AMP: principled neural network based compressive image recovery DeepCodec: adaptive sensing and recovery via deep convolutional neural networks Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection Clinical metagenomics Generating WGS trees with Mashtree Variant tolerant read mapping using min-hashing Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis BinDash, software for fast genome distance estimation on a typical personal laptop Dashing: fast and accurate genomic distances with HyperLogLog Finch: a tool adding dynamic abundance filtering to genomic MinHashing Streaming histogram sketching for rapid microbiome analytics Histosketch: fast similarity-preserving sketching of streaming histograms with concept drift kWIP: the k-mer weighted inner product, a de novo estimator of genetic similarity The khmer software package: enabling efficient nucleotide sequence analysis Locality-sensitive hashing for the edit distance Fast search of thousands of short-read sequencing experiments Improved search of large transcriptomic sequencing databases using split sequence bloom trees Ultrafast search of all deposited bacterial and viral genomic data Mash Screen: high-throughput sequence containment estimation for genome discovery Kraken: ultrafast metagenomic sequence classification using exact alignments Fast and sensitive protein alignment using DIAMOND KrakenUniq: confident and fast metagenomics classification using unique k-mer counts Improved metagenomic analysis with Kraken 2 Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters Efficient computation of spaced seeds Ganon: precise metagenomics classification against large and up-to-date sets of reference sequences DREAM-Yara: an exact read mapper for very large databases with short update time Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps A fast approximate algorithm for mapping long reads to large reference databases A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures Metagenomic binning through low-density hashing The ecologist's field guide to sequence-based identification of biodiversity A reference-free algorithm for computational normalization of shotgun sequencing data An improved filtering algorithm for big read datasets and its application to single-cell assembly WGSQuikr: fast whole-genome shotgun metagenomic classification Quikr: a method for rapid reconstruction of bacterial communities via compressive sensing MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation MISSION: While ntHash can be faster than xxHash, CityHash and MurmurHash, it is only appropriate for sequence data.
    keywords: algorithms; bloom; bloom filter; data; datasets; filter; functions; hash; hashing; memory; mers; minhash; number; query; reads; sequence; sequencing; set; similarity; sketch
       cache: cord-011565-8ncgldaq.txt
  plain text: cord-011565-8ncgldaq.txt

        item: #19 of 119
          id: cord-012975-u87ol3fs
      author: Ogiwara, Atsushi
       title: Construction of a dictionary of sequence motifs that characterize groups of related proteins
        date: 1992-09-17
       words: 3119
      flesch: 52
     summary: Sequence motifs with multiple blocks or sequence motifs with single blocks without substitution patterns could be used safely for superfamily assignment. key: cord-012975-u87ol3fs authors: Ogiwara, Atsushi; Uchiyama, Ikuo; Seto, Yasuhiko; Kanehisa, Minoru title: Construction of a dictionary of sequence motifs that characterize groups of related proteins date: 1992-09-17 journal:
    keywords: database; motif; patterns; sequence; superfamilies; superfamily
       cache: cord-012975-u87ol3fs.txt
  plain text: cord-012975-u87ol3fs.txt

        item: #20 of 119
          id: cord-014461-2ubh9u8r
      author: Nelson, Oranmiyan W.
       title: Genome sequences published outside of Standards in Genomic Sciences, July - October 2012
        date: 2012-10-10
       words: 4132
      flesch: 32
     summary: carotovorum Bacteriophage PP1 Complete Genome Sequences of Two Persicivirga Bacteriophages, P12024S and P12024L Genome sequence of the phage clP1, which infects the beer spoilage bacterium Pediococcus damnosus Complete Genome Sequence of Pseudomonas aeruginosa Siphophage MP1412 Complete Genome Sequences of Two Pseudomonas aeruginosa Temperate Phages, MP29 and MP42, Which Lack the Phage-Host CRISPR Interaction Genome Sequence of the Broad-Host-Range Pseudomonas Phage Φ-S1 Complete Genome Sequence of Pseudomonas aeruginosa Siphophage MP1412 Complete Genome Sequence of Staphylococcus aureus Bacteriophage GH15 Complete Genome Sequence of Vibrio vulnificus Bacteriophage SSP002 Whole genome sequence analyses of three African bovine rotaviruses reveal that they emerged through multiple reassortment events between rotaviruses from different mammalian species Complete Genome Sequence of an Avian Leukosis Virus Isolate Associated with Hemangioma and Myeloid Leukosis in Egg-Type and Meat-Type Chickens Genome Sequence of a Novel Reassortant H3N2 Avian Influenza Virus in Southern China Complete Genome Sequence of an H5N2 Avian Influenza Virus Isolated from a Parrot in Southern China Complete Genome Sequence of an Avian-Like H4N8 Swine Influenza Virus Discovered in Southern China Complete Genome Sequence of a Novel Avian Paramyxovirus Complete Genome Sequence of Avian Tembusu-Related Virus Strain WR Isolated from White Kaiya Ducks in Fujian Complete Genome Sequence of Bluetongue Virus Serotype 9: Implications for Serotyping Complete Genome Sequence of Bluetongue Virus Serotype 16 of Goat Origin from India Genome Sequence of a Bombyx mori Nucleopolyhedrovirus Strain with Cubic Occlusion Bodies Complete Genome Sequence of a Bovine Viral Diarrhea Virus 2 from Commercial Fetal Bovine Serum Complete Genome Sequences of Two Novel European Clade Bovine Foamy Viruses from Germany and Poland Complete Genome Sequences of Novel Canine Noroviruses in Hong Kong Complete Genome Sequence Analysis of a Recent Chicken Anemia Virus Isolate and Comparison with a Chicken Anemia Virus Isolate from Human Fecal Samples in China Complete Genome Sequence of a Chikungunya Virus Isolated in Guangdong Complete Genome Sequences of Two Chinese Virulent Avian Coronavirus Infectious Bronchitis Virus Variants Complete Genome Sequence of a Recombinant Coxsackievirus B4 from a Patient with a Fatal Case of Hand, Foot, and Mouth Disease in Guangxi Complete Genome Sequence of a Novel Human Enterovirus C (HEV-C117) Identified in a Child with Community-Acquired Pneumonia Complete Genome Sequence of the Genotype 4 Hepatitis E Virus Strain Prevalent in Swine in Jiangsu Province, China, Reveals a Close Relationship with That from the Human Population in This Area Complete Genome Sequence of an H10N8 Avian Influenza Virus Isolated from a Live Bird Market in Southern China Complete Genome Sequence of a Novel H9N2 Subtype Influenza Virus FJG9 Strain in China Reveals a Natural Reassortant Event Characterization and Complete Genome Sequence of Human Coronavirus NL63 Isolated in China Whole genome sequence analyses of three African bovine rotaviruses reveal that they emerged through multiple reassortment events between rotaviruses from different mammalian species Complete Genome Sequence of Ikoma Lyssavirus Analysis of the complete genome sequence of two Korean sacbrood viruses in the Honey bee, Apis mellifera The complete mitochondrial genome sequence of the western flower thrips Frankliniella occidentalis (Thysanoptera: Thripidae) contains triplicate putative control regions Genome Sequence of Methylobacterium sp. Complete Genome Sequence of a Street Rabies Virus from Mexico Genome sequence of a waterfowl aviadenovirus, goose adenovirus 4 Jenny)
    keywords: accession; avian; bacillus; bacteriophage; bacterium; china; complete; draft; genome; mycobacterium; plasmid; porcine; pseudomonas; sequence; sequence accession; staphylococcus; strain; streptococcus; subsp; virus
       cache: cord-014461-2ubh9u8r.txt
  plain text: cord-014461-2ubh9u8r.txt

        item: #21 of 119
          id: cord-014462-11ggaqf1
      author: None
       title: Abstracts of the Papers Presented in the XIX National Conference of Indian Virological Society, “Recent Trends in Viral Disease Problems and Management”, on 18–20 March, 2010, at S.V. University, Tirupati, Andhra Pradesh
        date: 2011-04-21
       words: 35463
      flesch: 47
     summary: The following virus isolates have been used in the analysis: GTPV-Uttarkashi, P60, vaccine virus; GTPV Mukteswar, P10, Challenge virus; GTPV (Akola), GTPV Bareilly/00, GTPV Ladakh/01 and GTPV Sambalpur/82, field isolates and SPPV Srinagar, P40; SPPV Ranipet, P50; SPPV-RF, P50, vaccine viruses and SPPV Makdhoom/07, SPPV CIRG/08, SPPV Pune/08, SPPV Bareilly, SPPV 183/03 and SPPV 125/02, field isolates. Present paper discusses about virus disease of quarantine importance affecting ornamental and fruit plants such as Chrysanthimum, Dahlia, Dianthus, Rosabengalensis, Cattleya, Cymbidium, Dendrobium, Lilium, Citrus, Vitis etc.
    keywords: acid; analysis; animals; antibodies; antigen; assay; cases; cells; cloned; control; crop; curl; dengue; detection; development; disease; dna; elisa; expression; field; food; gene; host; india; infection; isolates; leaf; management; methods; molecular; mosaic; mosaic virus; nucleotide; pathogens; patients; pcr; plant; positive; present; primers; production; protein; region; resistance; response; results; rna; samples; sequence; specific; study; symptoms; time; tomato; total; vaccine; vector; viral; virus; virus infection; viruses; world; yellow
       cache: cord-014462-11ggaqf1.txt
  plain text: cord-014462-11ggaqf1.txt

        item: #22 of 119
          id: cord-014674-ey29970v
      author: None
       title: Dreizehnter Bericht nach Inkrafttreten des Gentechnikgesetzes (GenTG) für den Zeitraum vom 1.1.2002 bis 31.12.2002 : Die Arbeit der Zentralen Kommission für die Biologische Sicherheit (ZKBS) im Jahr 2002
        date: 2003
       words: 2525
      flesch: 47
     summary: and therefore is not expected to allow specific amplification of p-35S sequences.] In the sequences of the amplification products AF434754, -55, -56, -57 in which the iPCR primer sequences can be identified the nucleotide sequences ahead of the primers are not from p-35S.The expected p-35S sequence is only partially present ahead of iCVM1 in AF434758.
    keywords: der; des; die; dna; für; gentechnik; ipcr; maize; p-35s; sequences; und
       cache: cord-014674-ey29970v.txt
  plain text: cord-014674-ey29970v.txt

        item: #23 of 119
          id: cord-015850-ef6svn8f
      author: Saitou, Naruya
       title: Eukaryote Genomes
        date: 2013-08-22
       words: 7442
      flesch: 48
     summary: The complete nucleotide sequence of the tobacco mitochondrial genome: Comparative analysis of mitochondrial genomes in higher plants and multipartite organization Widespread horizontal transfer of mitochondrial genes in fl owering plants Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin Small, repetitive DNAs contribute signifi cantly to the expanded mitochondrial genome of cucumber The complete nucleotide sequence of the tobacco chloroplast genome: Its gene organization and expression Changes in the structure of DNA molecules and the amount of DNA per plastid during chloroplast development in maize Pattern of organization of human mitochondrial pseudogenes in the nuclear genome Why genes in pieces? Introns. As for plants, Kaplinsky [ 62 ] ) compared genome sequences of Arabidopsis, grape rice, and Brachypodium and found >100 times more abundant CNSs from monocots than dicots.
    keywords: dna; duplication; eukaryotes; evolution; genes; genome; human; introns; junk; number; plants; protein; rna; sequence; size; species; type
       cache: cord-015850-ef6svn8f.txt
  plain text: cord-015850-ef6svn8f.txt

        item: #24 of 119
          id: cord-016293-pyb00pt5
      author: Newell-McGloughlin, Martina
       title: The flowering of the age of Biotechnology 1990–2000
        date: 2006
       words: 22413
      flesch: 45
     summary: These DNA chips have broad commercial applications and are now used in many areas of basic and clinical research including the detection of drug resistance mutations in infectious organisms, direct DNA sequence comparison of large segments of the human genome, the monitoring of multiple human genes for disease associated mutations, the quantitative and parallel measurement of mRNA expression for thousands of human genes, and the physical and genetic mapping of genomes. Of course for such a radical approach certain basal level criteria needed to be established for selecting disease candidates for human gene therapy.
    keywords: animal; biology; biotechnology; cancer; cells; company; data; development; disease; dna; drug; expression; food; gene; gene therapy; genome; human; influenza; information; level; molecular; nih; number; plant; production; products; project; protein; research; rna; scientists; sequence; sequencing; stem cells; studies; system; techniques; technology; therapy; time; transfer; transgenic; university; use; virus; year
       cache: cord-016293-pyb00pt5.txt
  plain text: cord-016293-pyb00pt5.txt

        item: #25 of 119
          id: cord-016594-lj0us1dq
      author: Flower, Darren R.
       title: Identification of Candidate Vaccine Antigens In Silico
        date: 2012-09-28
       words: 12575
      flesch: 33
     summary: A long, naturally presented immunodominant epitope from NY-ESO-1 tumor antigen: implications for cancer vaccine design Identification and characterization of pathogenicity and other genomic islands using base composition analyses A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands CpGcluster: a distance-based algorithm for CpG-island detection CpGIF: an algorithm for the identification of CpG islands Identifying CpG islands by different computational techniques CpG_MI: a novel approach for identifying functional CpG islands in mammalian genomes Evaluation of genomic island predictors using a comparative genomics approach IslandPath: aiding detection of genomic islands in prokaryotes Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models A computational approach for identifying pathogenicity islands in prokaryotic genomes Resolving the structural features of genomic islands: a machine learning approach Detection of genomic islands via segmental genome heterogeneity Prediction of pathogenicity islands in enterohemorrhagic Escherichia coli O157:H7 using genomic barcodes IslandViewer: an integrated interface for computational identification and visualization of genomic islands Towards pathogenomics: a web-based resource for pathogenicity islands Identification and characterization of a novel family of pneumococcal proteins that are protective against sepsis Functional genomics of pathogenic bacteria SYFPEITHI: database for searching and Tcell epitope prediction SYFPEITHI: database for MHC ligands and peptide motifs HIV sequence databases MHCBN 4.0: a database of MHC/TAP binding peptides and T-cell epitopes MHCBN: a comprehensive database of MHC binding and non-binding peptides EPIMHC: a curated database of MHCbinding peptides for customized computational vaccinology AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data JenPep: a novel computational information resource for immunobiology and vaccinology JenPep: a database of quantitative functional peptide data for immunology The immune epitope database 2.0 AntigenDB: an immunoinformatics database of pathogen antigens VIOLIN: vaccine investigation and online information network Epitopic peptides with low similarity to the host proteome: towards biological therapies without side effects Peptimmunology: immunogenic peptides and sequence redundancy Primer: mechanisms of immunologic tolerance Recent advances in immune modulation Cutting edge: contributions of apoptosis and anergy to systemic T cell tolerance Discriminating antigen and non-antigen using proteome dissimilarity III: tumour and parasite antigens Discriminating antigen and non-antigen using proteome dissimilarity II: viral and fungal antigens Discriminating antigen and non-antigen using proteome dissimilarity: bacterial antigens Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Single proteins might have dual but related functions in intracellular and extracellular microenvironments Locating proteins in the cell using TargetP, SignalP and related tools Improved prediction of signal peptides: SignalP 3.0 A comprehensive assessment of N-terminal signal peptides prediction methods WoLF PSORT: protein localization predictor Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains SubLoc: a server/client suite for protein subcellular location based on SOAP Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins Advantages of combined transmembrane topology and signal peptide prediction-the Phobius web server Prediction of lipoprotein signal peptides in Gram-negative bacteria Prediction of twin-arginine signal peptides Validating subcellular localization prediction tools with mycobacterial proteins Toward bacterial protein sub-cellular location prediction: single-class discrimminant models for all gram-and gram+ compartments Multi-class subcellular location prediction for bacterial proteins Alpha helical trans-membrane proteins: enhanced prediction using a Bayesian approach Beta barrel trans-membrane proteins: enhanced prediction using a Bayesian approach A predictor of membrane class: discriminating alpha-helical and beta-barrel membrane proteins from non-membranous proteins TATPred: a Bayesian method for the identification of twin arginine translocation pathway signal sequences LIPPRED: a web server for accurate prediction of lipoprotein signal sequences and cleavage sites Combining algorithms to predict bacterial protein sub-cellular location: parallel versus concurrent implementations Predicting the subcellular localization of viral proteins within a mammalian host cell Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells Structure and sequence relationships in the lipocalins and related proteins Structural Relationship of Streptavidin to the Calycin Protein Superfamily Analysis of known bacterial protein vaccine antigens reveals biased physical properties and amino acid composition Adaptation of protein surfaces to subcellular location Hierarchical classification of G-protein-coupled receptors with data-driven selection of attributes and classifiers GPCRTree: online hierarchical classification of GPCR function Optimizing amino acid groupings for GPCR classification On the hierarchical classification of G protein-coupled receptors Proteomic applications of automated GPCR classification VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Identifying candidate subunit vaccines using an alignment-independent method based on principal amino acid properties DNA and peptide sequences and chemical processes multivariately modeled by principal component analysis and partial least-squares projections to latent structures Principal property-values for 6 nonnatural amino-acids and their application to a structure activity relationship for oxytocin peptide analogs Peptide binding to the HLA-DRB1 supertype: a proteochemometrics analysis Proteochemometrics mapping of the interaction space for retroviral proteases and their substrates Proteochemometrics analysis of substrate interactions with dengue virus NS3 proteases Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand interactions Improved approach for proteochemometrics modeling: application to organic compound-amine G protein-coupled receptor interactions Melanocortin receptors: ligands and proteochemometrics modeling Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands Peptide quantitative structureactivity-relationships, a multivariate approach Multivariate parametrization of 55 coded and non-coded amino-acids New chemical descriptors relevant for the design of biologically active peptides. Vaccines based on APCs and peptides are new but unproven strategies; most modern vaccine development relies instead on effective searches for vaccine antigens.
    keywords: analysis; antigens; approach; binding; candidate; cell; data; database; discovery; epitope; genome; genomic; host; immunogenicity; islands; methods; mhc; peptide; prediction; protein; sequence; system; vaccines; vaccinology
       cache: cord-016594-lj0us1dq.txt
  plain text: cord-016594-lj0us1dq.txt

        item: #26 of 119
          id: cord-016798-tv2ntug6
      author: Gautam, Ablesh
       title: Bioinformatics Applications in Advancing Animal Virus Research
        date: 2019-06-06
       words: 6983
      flesch: 35
     summary: VIDA retrieves virus sequences from GenBank and the files are parsed into subfields. VIDA also provides functional classification of virus proteins into broad functional classes based on typical virus processes such as DNA and RNA replication, virus structural proteins, nucleotide and nucleic acid metabolism, transcription, glycoproteins and others.
    keywords: analysis; annotation; bioinformatics; database; et al; gene; genome; host; influenza; information; prediction; proteins; sequence; tools; virus; viruses; web
       cache: cord-016798-tv2ntug6.txt
  plain text: cord-016798-tv2ntug6.txt

        item: #27 of 119
          id: cord-017354-cndb031c
      author: Janies, D.
       title: Large-Scale Phylogenetic Analysis of Emerging Infectious Diseases
        date: 2008
       words: 12430
      flesch: 42
     summary: Here we review the computational challenges of comparative genomic analyses, specifically sequence alignment and reconstruction of phylogenetic trees. Phylogenetic trees are represented by acyclic graphs in which the leaves of these graphs represent the observed biological entities (taxa) being compared (e.g., sequences of genes, genomes, and/or anatomy of individuals, isolates or cultivars, species, or any higher level taxonomic unit).
    keywords: alignment; analysis; avian; character; data; host; human; influenza; isolates; length; methods; number; organisms; outgroup; search; sequence; strains; taxa; tree; viruses
       cache: cord-017354-cndb031c.txt
  plain text: cord-017354-cndb031c.txt

        item: #28 of 119
          id: cord-017584-9rx4jlw8
      author: Kim, Kwangsoo
       title: Selecting Genotyping Oligo Probes Via Logical Analysis of Data
        date: 2007
       words: 3665
      flesch: 50
     summary: In brief, the probe design methods of [2] and [27] required several CPU hours of computation and selected probes that obtained 85.6% and 81.1% correct classification rates, respectively. We used the three influenza virus N subtypes with 30 or more samples in Table 1 and selected monospecific probes for their classification.
    keywords: classification; data; probes; sequences; target
       cache: cord-017584-9rx4jlw8.txt
  plain text: cord-017584-9rx4jlw8.txt

        item: #29 of 119
          id: cord-017932-vmtjc8ct
      author: Georgiev, Vassil St.
       title: Genomic and Postgenomic Research
        date: 2009
       words: 8483
      flesch: 30
     summary: Next, these gene predictions can be further refined by searching for nearby regulatory sites such as the ribosome-binding sites, as well as by aligning protein sequences to other species. Large-scale prepublication information on genome sequences is a unique research resource for the scientific community, and rapid and unrestricted sharing of microbial genome sequence data is essential for advancing research on infectious agents responsible for human disease.
    keywords: analysis; centers; coli; data; diseases; genes; genome; genomic; host; human; influenza; microbial; niaid; proteins; research; sequence; sequencing
       cache: cord-017932-vmtjc8ct.txt
  plain text: cord-017932-vmtjc8ct.txt

        item: #30 of 119
          id: cord-018133-2otxft31
      author: Altman, Russ B.
       title: Bioinformatics
        date: 2006
       words: 9594
      flesch: 44
     summary: Computer systems within bioinformatics thus must be able to handle biological sequence information effectively and efficiently. Nonetheless, the effects of sequence information on clinical databases will be significant.
    keywords: analysis; bioinformatics; data; databases; dna; function; genes; genome; human; information; knowledge; molecules; protein; sequence; structure
       cache: cord-018133-2otxft31.txt
  plain text: cord-018133-2otxft31.txt

        item: #31 of 119
          id: cord-018459-isbc1r2o
      author: Munjal, Geetika
       title: Phylogenetics Algorithms and Applications
        date: 2018-12-10
       words: 1853
      flesch: 36
     summary: The limitations associated with sequence alignment methods lead to the development of alignment-free sequence analysis. Multiple sequence alignment methods emphasize that more closely related sequences should be aligned first.
    keywords: alignment; methods; sequences; species; tree
       cache: cord-018459-isbc1r2o.txt
  plain text: cord-018459-isbc1r2o.txt

        item: #32 of 119
          id: cord-018963-2lia97db
      author: Xu, Ying
       title: Protein Structure Prediction by Protein Threading
        date: 2010-04-29
       words: 15314
      flesch: 39
     summary: The protein threading problem with sequence amino acid interaction preferences is NP-complete Introduction to ProteinArchitecture: The StructuralBiology ofProteins A unified statistical framework for sequence comparison and structure comparison Emergence of preferred structures in a simple model of protein folding Are protein folds atypical? Designability of protein structures: A lattice-model study using the Miyazawa-Jernigan matrix A distance-dependent atomic knowledge-based potential for improved protein structure selection Geometric cooperativity and anti-cooperativity of threebody interactions in native proteins Multimeric threading-based prediction of protein-protein interactions on a genomic scale: Application to the Saccharomyces cerevisiae proteome Protein distance constraints predicted by neural networks and probability density functions Peons: A neuralnetwork-based consensus predictor that improves fold recognition Threading analysis suggests that the obese gene product may be a helical cytokine Comparative genomics ofthe Archaea (Euryarchaeota): Evolution of conserved protein families, the stable core, and the variable shell How many species are there on earth Improvement ofthe GenTHREADER method for genomic fold recognition Protein Structure Prediction by Protein Threading The Genomic Threading Database: A comprehensive resource for structural annotations of the genomes from key organisms Novel knowledge-based mean force potential at atomic level Statistical significance of protein structure prediction by threading Statistical significance of hierarchical multibody potentials based on Delaunay tessellation and their application in sequence-structure alignment SCOP: A structural classification of proteins database for the investigation of sequences and structures Protein superfamilies and domain superfolds CATH-A hierarchic classification of protein domain structures A local alignment method for protein structure motifs Threading with explicit models for evolutionary conservation ofstructure and sequence Combination ofthreading potentials and sequence profiles improves fold recognition Combinatorial Optimization: Algorithms and Complexity New techniques in structural NMR-anisotropic interactions Protein fold recognition through application of residual dipolar coupling data Protein structure prediction using sparse dipolar coupling data The anatomy and taxonomy ofprotein structure Graph minors .2. To keep up with the rate at which protein structures are being solved, there is a clear need for more automated domain-partitioning methods to process the newly solved structures.
    keywords: algorithm; alignment; amino; decomposition; energy; et al; families; fold; function; graph; number; prediction; problem; protein; protein structure; protein threading; query; sequence; structure; template; threading; tree
       cache: cord-018963-2lia97db.txt
  plain text: cord-018963-2lia97db.txt

        item: #33 of 119
          id: cord-022348-w7z97wir
      author: Sola, Monica
       title: Drift and Conservatism in RNA Virus Evolution: Are They Adapting or Merely Changing?
        date: 2007-09-02
       words: 10898
      flesch: 50
     summary: Muller's ratchet decreases fitness of a DNA-based microbe Increased immune response elicited by DNA vaccination with a synthetic gp120 sequence with optimized codon usage The phylogeny of The Canterbury Tales Isolation of new ribozymes from a large pool of random sequences Forced evolution of a regulatory RNA helix in the HIV-1 genome Role of the first and third extracellular domains of CXCR-4 in human immunodeficiency virus coreceptor activity Molecular Mechanisms of Immune Responses in Insects Nucleotide composition as a driving force in the evolution of retroviruses Unusually high frequency of Epstein-Barr virus genetic variants in Papua New Guinea that can escape cytotoxic T-cell recognition: implications for virus evolution Role of host immune response in selection of equine infectous anemia virus variants Fitness of RNA virus decreased by Muller's ratchet Evolution of sex and the molecular clock in RNA viruses HIV and T-cell expansion in splenic white pulps is accompanied by infiltration of HIV-specific cytotoxic T-lymphocytes Antigenic stimulation by BCG as an in vivo driving force for SIV replication and dissemination Genetic bottlenecks and population passages cause profound fitness differences in RNA viruses Nucleotide sequences of three Nodavirus RNA2's: the messengers for their coat protein precursors Primary and secondary structure of black beetle virus RNA2, the genomic messenger for BBV coat protein precursor HLA-A11 epitope loss isolates of Epstein-Barr virus from a highly Al1+ population T cell responses and virus evolution: loss of HLA All-restricted CTL epitopes in Epstein-Barr virus isolates from highly All-positive populations by selective mutation of anchor residues RNA virus quasispecies populations can suppress vastly superior mutant progeny The genome sequence of herpes simplex virus type 2 RNA viral mutations and fitness for survival Basic concepts in RNA virus evolution Origins and evolutionary relationships of retroviruses Rates of spontaneous mutations among RNA viruses Rapid fitness losses in mammalian RNA virus clones due to Muller's ratchet High viral load and CD4 lymphopenia in rhesus and cynomolgus macaques infected by a chimeric primate lentivirus constructed using the env, rev, tat, and vpu genes from HIV-1 Lai The viral quasispecies Sequence space and quasispecies distribution Structurally complex and highly active RNA ligases derived from random RNA sequences Does the VP1 gene of foot-and-mouth disease virus behave as a molecular clock? key: cord-022348-w7z97wir authors: Sola, Monica; Wain-Hobson, Simon title: Drift and Conservatism in RNA Virus Evolution: Are They Adapting or Merely Changing? date: 2007-09-02 journal: Origin and Evolution of Viruses DOI: 10.1016/b978-012220360-2/50007-6 sha: doc_id: 22348 cord_uid: w7z97wir This chapter argues that the vast majority of genetic changes or mutations fixed by RNA viruses are essentially neutral or nearly neutral in character.
    keywords: acid; amino; et al; evolution; example; figure; fitness; genomes; hiv; human; immunodeficiency; mutations; number; proteins; rna; selection; sequence; substitutions; variation; virus; viruses; vivo
       cache: cord-022348-w7z97wir.txt
  plain text: cord-022348-w7z97wir.txt

        item: #34 of 119
          id: cord-022494-d66rz6dc
      author: Webb, B.
       title: Comparative Modeling of Drug Target Proteins
        date: 2014-10-01
       words: 8784
      flesch: 45
     summary: 19, 20 Computational protein structure prediction methods, such as threading 21 and comparative protein structure modeling, 22, 23 strive to bridge the sequence-structure gap by utilizing these evolutionary relationships. 9 Shown are the different ranges of applicability of comparative protein structure modeling, threading, and de novo structure prediction, their corresponding accuracies, and their sample applications.
    keywords: accuracy; alignment; docking; drug; errors; identity; ligand; methods; modeling; models; protein; sequence; structure; target; template
       cache: cord-022494-d66rz6dc.txt
  plain text: cord-022494-d66rz6dc.txt

        item: #35 of 119
          id: cord-023208-w99gc5nx
      author: None
       title: Poster Presentation Abstracts
        date: 2006-09-01
       words: 71178
      flesch: 41
     summary: Peptide structures can be approached by spectroscopy and NMR techniques but data from these approaches too frequently diverge. To increase the stability and the therapeutic efficacy of peptide sequences from myelin oligodendrocyte protein (MOG) that act as multiple sclerosis (MS) antigens, we grafted them onto a framework of a particularly stable class of peptides, the cyclotides.
    keywords: acid; activation; activity; affinity; aggregation; aim; alpha; amino; amino acid; analogues; analysis; approach; arg; assay; beta; binding; blood; bond; cancer; cell; chain; chemical; chemistry; complex; complexes; compounds; concentration; conformational; conjugates; cyclic; data; derivatives; design; development; disulfide; dna; domain; effect; enzyme; epitope; fmoc; fragment; gly; group; growth; hplc; human; inhibitors; integrin; interaction; ligands; mechanism; membrane; method; mice; model; molecular; molecules; native; new; nmr; non; novel; opioid; order; patients; peptide; peptide analogues; peptide chain; peptide synthesis; phase; phase peptide; phe; position; potential; prepared; presence; products; proline; properties; protein; reaction; receptor; residues; results; role; sequence; site; solution; specific; spectroscopy; stability; strategy; structure; studies; study; surface; synthetic; system; target; terminal; therapeutic; treatment; tumor; turn; type; tyr; use; vivo; water; work
       cache: cord-023208-w99gc5nx.txt
  plain text: cord-023208-w99gc5nx.txt

        item: #36 of 119
          id: cord-023209-un2ysc2v
      author: None
       title: Poster Presentations
        date: 2008-10-07
       words: 112272
      flesch: 42
     summary: A specifi c bioassay was developed for screening peptides activity in high salinity conditions in order to evaluate the inhibition of biofi lm growth, based on growing biofi lmforming bacteria in a 96-wells microtiter plate. The insight into the molecular mechanism of peptides activity is obtained in vitro using SAXS method and artifi cial systems mimicking a bacterial cytoplasmic membrane.
    keywords: acid peptide; acid residues; acids; activation; activities; activity; affi; agents; aggregation; aim; ala; amide; amino; amino acid; amyloid; analogs; analogues; analysis; antimicrobial; application; approach; assay; backbone; binding; biological; blood; bond; brain; c peptide; cancer; cation; cell; chemical; chemistry; cient; city; coil; complex; compounds; concentration; conditions; confi; conformational; conjugates; coupling; cyclic; data; delivery; derivatives; design; development; diseases; dna; domain; drug; ed peptides; effect; effi; experiments; family; fmoc; formation; fragments; free; function; gly; group; helix; hplc; human; identifi; infl; inhibition; inhibitors; interaction; interest; leu; ligation; lipid; mass; mechanism; membrane; method; microwave; model; model peptides; modifi; modifi ed; molecules; native; natural; new; nity; nmr; non; novel; number; order; peptide; peptide analogues; peptide bond; peptide chain; peptide chemistry; peptide fragments; peptide library; peptide ligands; peptide sequence; peptide structure; peptide synthesis; phase peptide; phase synthesis; phe; position; positive; potential; presence; present; process; processes; properties; protein; purifi; range; reaction; receptor; recognition; region; report; residues; results; role; rst; selective; sequence; signifi; site; solution; specifi c; spectroscopy; stability; stable; strategy; structure; studies; study; surface; synthetic; system; target; terminal; terminus; therapy; time; treatment; trp; tumor; type; university; uorescence; uptake; use; vitro; vivo; water; work
       cache: cord-023209-un2ysc2v.txt
  plain text: cord-023209-un2ysc2v.txt

        item: #37 of 119
          id: cord-023647-dlqs8ay9
      author: None
       title: Sequences and topology
        date: 2003-03-21
       words: 4522
      flesch: 38
     summary: A 32-kDa Llpo~ortin from Human Mononuclear Cells Appears to be Identical with the Placental Inhibitor of Blood Coagulation Distinct Fercedoxins from Rhodobacter-Capsulstus -Complete Amino Acid Sequences and Molecular Evolution N~ptide Sequence Analysis and Molecular Cloning Reveal Two Calcium Pump Isoforms in the Human Erythrocyte Membgane Cloning and Characterization of a Novel Member of the Cytochrome-P450 Subfamily IVA in Rat Prostate A Directiy Repeated Sequence in the ~-Globin Promoter Resulates Transcription in Murine Efythroleukemla Cells Isolation and Chamcterizatinn of the Alkane-Inducibie NADPH-Cytochrome-P-450 Olf, idoreductsse Gene from Candida-Tropicalls -Identification of Invarlant Residues Wlthin Slmilmr Amino Acid Sequences of Direr'sent Flavoproteins Protein Klnase-C Inhibitor Proteins -Purification from Sheep Brain and Sequence Similarity to Lipocortins and 14-3-3 MCI~ AVEmL~ B& Sequence Homology Between Purple Acid Phosphatases and Phusphoprotein Pho*phatsses --are Phesphoprotcin Phosphatatms Metalloproteins Collt~|nln~ Oil~-bridged Dinuclcar Metal Centers Negative Regulation of the Human ~-Globin Ca~ne by Transcriptional Interference: Role of an Mu Repetitive ~lement Amino Acid Sequence of Chicken Catisequestrin Deduced from C DNA -Comp~rison of Caisequestrin and Aspartactin Caisequestrin, an Intesccilular Calciumbinding Protein of Skeletal Muscle Sarcoplssmic Reticulm, Is HomoloKous to ~, a Putstive latminin-binding Protein of the Exteac¢llular Matr~ BOvSm~ ]Prote~ C Inhihl.gog with Structugll and Fun~ HotDoIO~OU~ ]~-.gtl~ to Hum~zn The 188 ltilm0omal RNA ~-quence of the S~t Anemone Anemom~s ssdcmta and Its Evolutionary INtuition Amomqg Other Eukaryotes Inferred b'om S~l,.m.~ Comlmrttmas of a Heat Shock G~ae in Two Nematorl~ The l~'/O Multtgene Family of Ok~hag of CDNA ~ for the ~ Omin of Human Complement Component ca~bi~una Protein, seqaenoe Homolo~ with thc a C~t~:~a~h Proc Natl Acad S¢t USA1990 Highly Conserved Core Domain and Unique N Terminus with Presumptive Regulatory Moti~ in a Hmman TATA Factor (l'lql~)
    keywords: acid; amino; amino acid; analysis; cell; conserved; dna; domain; evolution; factor; family; gene; homology; human; member; new; novel; protein; rna; sequence; similarity; structure; virus; yeast
       cache: cord-023647-dlqs8ay9.txt
  plain text: cord-023647-dlqs8ay9.txt

        item: #38 of 119
          id: cord-025610-7vouj8pp
      author: Latif, Seemab
       title: Backward-Forward Sequence Generative Network for Multiple Lexical Constraints
        date: 2020-05-06
       words: 3924
      flesch: 44
     summary: However, generating sequence from pre-specified lexical constraints is a new, challenging and less researched area in NLG. Our proposed approach shows lower perplexity than CGMH sampling method for sentence generation through keywords/constraints 1 to 3, while with 4 constraints as input CGMH shows slightly better result than our approach of generating sequence with verb constraint and during inference replacing the words in sequence with closest embedding similarity.
    keywords: backward; constraints; forward; language; model; sequence; word
       cache: cord-025610-7vouj8pp.txt
  plain text: cord-025610-7vouj8pp.txt

        item: #39 of 119
          id: cord-025948-6dsx7pey
      author: Maitra, Arindam
       title: Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility
        date: 2020-06-04
       words: 7221
      flesch: 48
     summary: Viral RNA sequences obtained from two samples S11 and S12 shared all mutations except a V32L mutation at ORF8 harboured by S11 and not by S12. The sequencing reads obtained in shotgun RNA-Seq experiment were mapped to reference viral sequence, variants detected and consensus sequence for each sample built using Dragen RNA pathogen detection software (version 9) in BaseSpace (Illumina Inc, USA).
    keywords: binding; chain; clade; cov-2; d614; genome; india; mirnas; mutations; protein; rna; samples; sars; sequences
       cache: cord-025948-6dsx7pey.txt
  plain text: cord-025948-6dsx7pey.txt

        item: #40 of 119
          id: cord-027316-echxuw74
      author: Modarresi, Kourosh
       title: Detecting the Most Insightful Parts of Documents Using a Regularized Attention-Based Model
        date: 2020-05-22
       words: 2117
      flesch: 35
     summary: Deep Learning Summit Standardization of featureless variables for machine learning models using natural language processing Generalized variable conversion using k-means clustering and web scraping An efficient deep learning model for recommender systems Effectiveness of Representation Learning for the Analysis of Human Behavior An evaluation metric for content providing models, recommendation systems, and online campaigns Combined Loss Function for Deep Convolutional Neural Networks A Randomized Algorithm for the Selection of Regularization Parameter. A neural probabilistic language model Theano: a CPU and GPU math expression compiler Audio chord recognition with recurrent neural networks A singular value thresholding algorithm for matrix completion Exact matrix completion via convex optimization Compressive sampling Long short-term memory-networks for machine reading Learning phrase representations using RNN encoder-decoder for statistical machine translation Framewise phoneme classification with bidirectional LSTM and other neural network architectures Generating sequences with recurrent neural networks The Elements of Statistical Learning; Data miNing, Inference and Prediction Handwritten digit recognition via deformable prototypes Gene Shaving' as a method for identifying distinct sets of genes with similar expression patterns Matrix Completion via Iterative Soft-Thresholded SVD Package 'impute'.
    keywords: embedding; encoder; learning; model; neural; translation
       cache: cord-027316-echxuw74.txt
  plain text: cord-027316-echxuw74.txt

        item: #41 of 119
          id: cord-031957-df4luh5v
      author: dos Santos-Silva, Carlos André
       title: Plant Antimicrobial Peptides: State of the Art, In Silico Prediction and Perspectives in the Omics Era
        date: 2020-09-02
       words: 16639
      flesch: 34
     summary: Thus, there is a need for computational framework methods to predict protein structures based on the knowledge of the sequence. In addition, in recent years, there has been impressive progress in the development of algorithms for protein folding that may aid in the prediction of protein structures from amino acid sequence information.
    keywords: acid; activity; amps; analysis; antifungal; approaches; binding; bonds; cysteine; database; defensins; disulfide; docking; family; figure; function; gene; identification; information; lipid; methods; modeling; models; motif; novel; pathogen; peptides; plant; potential; prediction; present; protein; residues; sequence; structure
       cache: cord-031957-df4luh5v.txt
  plain text: cord-031957-df4luh5v.txt

        item: #42 of 119
          id: cord-033010-o5kiadfm
      author: Durojaye, Olanrewaju Ayodeji
       title: Potential therapeutic target identification in the novel 2019 coronavirus: insight from homology modeling and blind docking study
        date: 2020-10-02
       words: 8149
      flesch: 48
     summary: Qualitative Model Energy Analysis (QMEAN) is a composite scoring function that describes protein structures on the basis of major geometrical aspects. A novel coronavirus and SARS Crystal structures of the main peptidase from the SARS coronavirus inhibited by a substrate-like aza-peptide epoxide Dissection study on the SARS 3C-like protease reveals the critical role of the extra domain in dimerization of the enzyme: defining the extra domain as a new target for design of highly-specific protease inhibitors 3C-like proteinase from SARS coronavirus catalyzes substrate hydrolysis by a general base mechanism Only one protomer is active in the dimer of SARS 3C-like proteinase Biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus 3C-like proteinase A trial of lopinavir-ritonavir in adults hospitalized with severe covid-19 EMBOSS: the European molecular biology open software suite SRS, an indexing and retrieval tool for flat file data libraries Issues in bioinformatics benchmarking: the case study of multiple sequence alignment HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 Toward the estimation of the absolute quality of individual protein structure models MolProbity: more and better reference data for improved all-atom structure validation Chapter 2: Protein Composition and Structure Modeling protein quaternary structure of homo-and hetero-oligomers beyond binary interactions by homology UCSF chimera-a visualization system for exploratory research and analysis Fasman GD (1974) Prediction of protein conformation Protein Identification and Analysis Tools on the ExPASy Server The rapid generation of mutation data matrices from protein sequences MEGA7:
    keywords: 2019; acid; amino; amino acid; binding; coronavirus; docking; model; ncov; protein; proteinase; sars; score; sequence; structure; target protein; template
       cache: cord-033010-o5kiadfm.txt
  plain text: cord-033010-o5kiadfm.txt

        item: #43 of 119
          id: cord-035033-osjy88rc
      author: Aydin, Berkay
       title: Spatiotemporal event sequence discovery without thresholds
        date: 2020-11-09
       words: 8236
      flesch: 50
     summary: In this work, we focus on spatiotemporal event sequences (STES) from event datasets that contain instances with region-based geometric representations. Given this information, the task of STES mining, in general, is interested in discovering spatiotemporal event sequences whose instance sequences are frequently repeated.
    keywords: algorithm; data; datasets; event; event sequences; follow; instances; mining; sequences; stess; threshold; time; values
       cache: cord-035033-osjy88rc.txt
  plain text: cord-035033-osjy88rc.txt

        item: #44 of 119
          id: cord-102766-n6mpdhyu
      author: Alam, Md. Nafis Ul
       title: Short k-mer Abundance Profiles Yield Robust Machine Learning Features and Accurate Classifiers for RNA Viruses
        date: 2020-06-25
       words: 3202
      flesch: 47
     summary: It has been 90 demonstrated that RNA-Seq data can be a very promising avenue for improving knowledge on 91 RNA viruses when leveraged by tactful algorithms Viral metagenomics Third generation sequencing: technology and its potential impact on 602 evolutionary biodiversity research Virus taxonomy: the database of the International Committee on 606 Nucleic Acids Research Accelerated Profile HMM Searches BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-610 seq Data Bridger: a new framework for de novo transcriptome assembly using 612 RNA-seq data Shannon: An Information-Optimal de Novo RNA-Seq Assembler rnaSPAdes: <em>a de novo</em> transcriptome assembler and 616 its application to RNA-Seq data IDBA-tran: a more robust de novo de Bruijn graph assembler for 621 transcriptomes with uneven expression levels SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq 623 reads De novo assembly and analysis of RNA-seq data Full-length transcriptome assembly from RNA-Seq data without a 627 reference genome Oases: robust de novo RNA-seq assembly across the dynamic range 629 of expression levels S1 Fig.
    keywords: data; feature; genomes; learning; machine; models; rna; sequence; viruses
       cache: cord-102766-n6mpdhyu.txt
  plain text: cord-102766-n6mpdhyu.txt

        item: #45 of 119
          id: cord-103029-nc5yf6x4
      author: Wichmann, Stefan
       title: Computational design of genes encoding completely overlapping protein domains: Influence of genetic code and taxonomic rank
        date: 2020-09-25
       words: 8666
      flesch: 47
     summary: Other properties required for functional protein sequences can be inferred from the evolutionary information contained in sequence alignments of protein families. Constructed OLG sequences are also indistinguishable from natural sequences in terms of amino acid identity and secondary structure, while the minimum nucleotide change required for overprinting an overlapping sequence can be as low as 1.8% of the sequence.
    keywords: acid; amino; code; fig; genes; olg; olgs; overlapping; protein; sequences; structure
       cache: cord-103029-nc5yf6x4.txt
  plain text: cord-103029-nc5yf6x4.txt

        item: #46 of 119
          id: cord-103297-4stnx8dw
      author: Widrich, Michael
       title: Modern Hopfield Networks and Attention for Immune Repertoire Classification
        date: 2020-08-17
       words: 14116
      flesch: 51
     summary: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning Explaining and interpreting LSTMs Solving the protein sequence metric problem Rank-loss support instance machines for miml instance annotation Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires Multiple instance learning: a survey of problem characteristics and applications VDJServer: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements Tetramer-visualized gluten-specific CD4+ T cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories Support-vector networks Quantifiable predictive features define epitope-specific T cell receptor repertoires On a model of associative memory with huge storage capacity BERT: pre-training of deep bidirectional transformers for language understanding Solving the multiple instance problem with axis-parallel rectangles Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire Predicting antigen-specificity of single T-cells based on TCR CDR3 regions. We apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort.
    keywords: attention; classification; data; datasets; deeprc; et al; hopfield; input; learning; lstm; methods; motif; networks; number; repertoire; search; sequences; table
       cache: cord-103297-4stnx8dw.txt
  plain text: cord-103297-4stnx8dw.txt

        item: #47 of 119
          id: cord-193356-hqbstgg7
      author: None
       title: cord-193356-hqbstgg7
        date: None
       words: 14115
      flesch: 51
     summary: A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning Explaining and interpreting LSTMs Solving the protein sequence metric problem Rank-loss support instance machines for miml instance annotation Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires Multiple instance learning: a survey of problem characteristics and applications VDJServer: a cloud-based analysis portal and data commons for immune repertoire sequences and rearrangements Tetramer-visualized gluten-specific CD4+ T cells in blood as a potential diagnostic marker for coeliac disease without oral gluten challenge iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories Support-vector networks Quantifiable predictive features define epitope-specific T cell receptor repertoires On a model of associative memory with huge storage capacity BERT: pre-training of deep bidirectional transformers for language understanding Solving the multiple instance problem with axis-parallel rectangles Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire Predicting antigen-specificity of single T-cells based on TCR CDR3 regions. We apply random and attention-based subsampling of repertoire sequences to reduce over-fitting and decrease computational effort.
    keywords: attention; classification; data; datasets; deeprc; et al; hopfield; input; learning; lstm; methods; motif; networks; number; repertoire; search; sequences; table
       cache: cord-193356-hqbstgg7.txt
  plain text: cord-193356-hqbstgg7.txt

        item: #48 of 119
          id: cord-193910-7p3f3znj
      author: Zhang, Xiangxie
       title: Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification
        date: 2020-11-01
       words: 7746
      flesch: 59
     summary: In the experiments, the performances of feature extraction using primers and random DNA sequences will be compared to several other machine learning approaches. Since 37 primers of HCV were acquired, we generated three groups of random DNA sequences, and each contains 37 DNA sequences.
    keywords: data; dna; dna sequences; extraction; feature; learning; method; model; results; sequences; string
       cache: cord-193910-7p3f3znj.txt
  plain text: cord-193910-7p3f3znj.txt

        item: #49 of 119
          id: cord-203232-1nnqx1g9
      author: Canturk, Semih
       title: Machine-Learning Driven Drug Repurposing for COVID-19
        date: 2020-06-25
       words: 5028
      flesch: 49
     summary: Using the National Center for Biotechnology Information virus protein database and the DrugVirus database, which provides a comprehensive report of broad-spectrum antiviral agents (BSAAs) and viruses they inhibit, we trained ANN models with virus protein sequences as inputs and antiviral agents deemed safe-in-humans as outputs. This undermined our assumption that drug trials are hierarchical; though, in reality this is usually the case.
    keywords: acid; amino; antivirals; cov-2; database; dataset; drug; models; sars; sequences; virus
       cache: cord-203232-1nnqx1g9.txt
  plain text: cord-203232-1nnqx1g9.txt

        item: #50 of 119
          id: cord-213136-euv6pqh5
      author: Singh, Kulveer
       title: Sequence Effects on Internal Structure of Droplets of Associative Polymers
        date: 2020-05-17
       words: 4331
      flesch: 51
     summary: Similar time evolution is observed in all other systems with different polymer sequences and in all cases the time it takes a single droplet to form is below 20, 000. As we have shown before, this choice of interaction parameters guarantees phase separation via formation of polymer droplets.
    keywords: droplet; polymer; sequences; solvent; stickers
       cache: cord-213136-euv6pqh5.txt
  plain text: cord-213136-euv6pqh5.txt

        item: #51 of 119
          id: cord-252347-vnn4135b
      author: Lee, Wai-Ming
       title: A Diverse Group of Previously Unrecognized Human Rhinoviruses Are Common Causes of Respiratory Illnesses in Infants
        date: 2007-10-03
       words: 5718
      flesch: 46
     summary: Selection of the target region To identify a genomic region suitable for molecular typing of HRV, we analyzed all published HRV sequences. These results suggested HRV serotypes are stable and do not undergo influenza virus-like antigenic drift [7] .
    keywords: hrv; hrvs; human; new; pcr; region; sequences; serotypes; strains
       cache: cord-252347-vnn4135b.txt
  plain text: cord-252347-vnn4135b.txt

        item: #52 of 119
          id: cord-253436-dz84icdc
      author: Wille, Michelle
       title: High Prevalence and Putative Lineage Maintenance of Avian Coronaviruses in Scandinavian Waterfowl
        date: 2016-03-03
       words: 2020
      flesch: 46
     summary: Influenza A virus, avian paramyxovirus and avian coronavirus Multiple Alignment of DNA Sequences with MAFFT Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods Estimating maximum likelihood phylogenies with PhyML SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building FigTree v1.1.1: Tree figure drawing tool Diverse gammacoronaviruses detected in wild birds from Madagascar Detection and molecular characterization of infectious bronchitis-like viruses in wild bird populations Genetically diverse coronaviruses in wild bird populations of northern England Molecular identification and characterization of novel coronaviruses infecting graylag geese (Anser anser), feral pigeons (Columbia livia) and mallards (Anas platyrhynchos) Identification of avian coronavirus in wild aquatic birds of the central and eastern USA Surveillance of avian coronaviruses in wild bird populations of Korea Absence of coronaviruses, paramyxoviruses, and influenza A viruses in seabirds in the southwestern Indian Ocean Animal migration and infectious disease risk Juveniles and migrants as drivers for seasonal epizootics of avian influenza virus Global patterns of influenza a virus in wild birds The evolutionary genetics and emergence of avian influenza A viruses in wild birds Spatial, temporal, and species variation in prevalence of influenza A viruses in wild migratory birds We wish to thank and the duck trappers at Ottenby Bird Observatory and Jonas Waldenström for collecting and providing samples used in this study, Jonas Blomberg for kindly providing sequence, Mallard CoV sequences generated in this study are indicated with a filled circle and Scaup CoV sequences with an asterisk. We found a prevalence of 18.7% CoV, which is higher than the 0-15% reported previously in wild bird studies [11, 14, 15, [21]
    keywords: coronaviruses; cov; prevalence; sequences; species
       cache: cord-253436-dz84icdc.txt
  plain text: cord-253436-dz84icdc.txt

        item: #53 of 119
          id: cord-254942-g51mjj2b
      author: Touati, Rabeb
       title: New methodology for repetitive sequences identification in human X and Y chromosomes
        date: 2020-10-19
       words: 7718
      flesch: 49
     summary: Two-thirds of the human genome consists of repetitive DNA sequences The identification of repetitive DNA sequences is taking greater and greater importance these days.
    keywords: chromosomes; dna; dna sequences; fig; genome; human; image; patterns; repeat; scalogram; sequences; tandem
       cache: cord-254942-g51mjj2b.txt
  plain text: cord-254942-g51mjj2b.txt

        item: #54 of 119
          id: cord-255194-4i9fc0r7
      author: Djikeng, Appolinaire
       title: Viral genome sequencing by random priming methods
        date: 2008-01-07
       words: 3778
      flesch: 46
     summary: A cutoff e value of 10 -25 was used to identify viral sequences which matched the reference genome. The work presented here demonstrates the utility of the random genome sequencing method for the generation of viral sequence from positive strand ssRNA (Human Rhinovirus, Turkey astrovirus) and negative strand ssRNA viruses (Newcastle disease virus), ssDNA (enterobacteriphage M13) and dsDNA viruses (woodchuck hepatitis virus and lambda phage).
    keywords: coverage; genome; method; sequence; sequencing; sispa; viral; viruses
       cache: cord-255194-4i9fc0r7.txt
  plain text: cord-255194-4i9fc0r7.txt

        item: #55 of 119
          id: cord-255371-o9oxchq6
      author: Nguyen, Thanh Thi
       title: Genomic Mutations and Changes in Protein Secondary Structure and Solvent Accessibility of SARS-CoV-2 (COVID-19 Virus)
        date: 2020-07-10
       words: 5655
      flesch: 52
     summary: For the mutation detection purpose, we apply a dynamic programming algorithm to protein AA sequences to get global pairwise alignments between a reference sequence and a query sequence. There have been various protein secondary structure prediction programs in the literature and many of those were developed based on artificial intelligence models using protein AA sequences such as JPred4 [29] , Spider2
    keywords: accessibility; cov-2; gene; mutations; number; protein; sars; sequences; solvent; structure; virus
       cache: cord-255371-o9oxchq6.txt
  plain text: cord-255371-o9oxchq6.txt

        item: #56 of 119
          id: cord-256278-jvfjf7aw
      author: Feng, Jie
       title: New method for comparing DNA primary sequences based on a discrimination measure
        date: 2010-10-21
       words: 2868
      flesch: 42
     summary: Analysis of genomic sequences by chaos game representation Universal sequence map (USM) of arbitrary discrete sequences Computing distribution of scale independent motifs in biological sequences Biological sequences as pictures: a generic two dimensional solution for iterated maps A measure of similarity of sets of sequences not requiring sequence alignment Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarities of natural sequences Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders Exploration of phylogenetic data using a global sequence analysis method Shared information and program plagiarism detection Algorithmic clustering of music based on string compression Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison Genomic signature: characterization and classification of species assessed by chaos game representation of sequences Detection and characterization of horizontal transfers in prokaryotes using genomic signature H curves, a novel method of representation of nucleotides series especially suited for long DNA sequences Characteristic sequences for DNA primary sequence Metrics for comparing regulatory sequences on the basis of pattern counts Chaos game representation of gene structure Chaos game representation for comparison of whole genomes A statistical method for alignment free comparison of regulatory sequences Dinucleotide relative abundance extremes: a genomic signature Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances Directed graphs of DNA sequences and their numerical characterization 2-D graphical representation of protein sequences and its application to coronavirus phylogeny An information based sequence distance and its application to whole mitochondrial genome phylogeny A 2D graphical representation of DNA sequence A relative similarity measure for the similarity analysis of DNA sequences Characteristic distribution of L-tuple for DNA primary sequence An extension of the Burrows-Wheeler transform Distance measures for biological sequences: some recent approaches A new graphical representation and analysis of DNA sequence structure A new sequence distance measure for phylogenetic tree construction Improved tools for biological sequence comparison Spectral distortion measures for biological sequence comparisons and database searching A probabilistic measure for alignment-free sequence comparison Evolutionary implications of microbial genome tetranucleotide frequency biases Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach New 3D graphical representation of DNA sequence based on dual nucleotides On the similarty of DNA primary sequences On the characterization of DNA primary sequences by triplet of nucleic acid bases Novel 2-D graphical representation of DNA sequences and their numerical characterization Analysis of similarity/ dissimilarity of DNA sequences based on novel 2-D graphical representation Quantifying the speciesspecificity in genomic signatures, synonymous codon choice, amino acid usage and G +C content Statistical analysis of L-tuple frequencies in eubacteria and organells Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human Integrated gene and species phylogenies from unaligned whole genome protein sequences Application of tetranucleotide frequencies for the assignment of genomic fragments Alignment-free sequence comparison-a review The spectrum of genomic signatures: from dinucleotides to chaos game representation A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words Statistical measures of DNA dissimilarity under Markov chain models of base composition The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications The Z curve database: a graphic representation of genome sequences Coronavirus phylogeny based on a geometric approach We thank all the anonymous referees for their valuable suggestions and support. In the first group, researchers represent DNA sequence by curves (Hamori and Ruskin, 1983; Nandy, 1994; Randic et al., 2003a; Zhang et al., 2003; Liao, 2005; Li et al., 2006; Qi et al., 2007; Yu et al., 2009) , numerical sequences (He and Wang, 2002) , or matrices (Randic, 2000; Randic et al., 2001) .
    keywords: discrimination; dna; method; representation; sequences
       cache: cord-256278-jvfjf7aw.txt
  plain text: cord-256278-jvfjf7aw.txt

        item: #57 of 119
          id: cord-256608-ajzk86rq
      author: van Weezep, Erik
       title: PCR diagnostics: In silico validation by an automated tool using freely available software programs
        date: 2019-05-13
       words: 4953
      flesch: 48
     summary: To increase the accuracy of the alignment search (see Discussion), large sequences were fragmented in sequences of maximal 3000 nucleotides with an overlap of 50 nucleotides to prevent the loss of hits of primer or probe sequences spanning the split site. Primer and probe sequences were inserted in all possible combinations and orientations potentially initiating amplification ( Fig. 1 ).
    keywords: pcr; pcrv; primer; probe; sequences; silico; validation; virus
       cache: cord-256608-ajzk86rq.txt
  plain text: cord-256608-ajzk86rq.txt

        item: #58 of 119
          id: cord-263987-ff6kor0c
      author: Holmes, Ian H.
       title: Solving the master equation for Indels
        date: 2017-05-12
       words: 7132
      flesch: 39
     summary: Parameterizing sequence alignment with an explicit evolutionary model Multiple genome rearrangement and breakpoint phylogeny Analytical expression of the purine/pyrimidine codon probability after and before random mutations Analytical solutions of the dinucleotide probability after and before random mutations RNA secondary structure prediction using stochastic context-free grammars and evolutionary history Evolution probabilities and phylogenetic distance of dinucleotides Genome evolution by transformation, expansion and contraction (GETEC) An evolutionary model for maximum likelihood alignment of DNA sequences An introduction to probability theory and its applications Evolutionary HMMs: a Bayesian approach to multiple alignment Using guide trees to construct multiple-sequence evolutionary HMMs Accurate reconstruction of insertion-deletion histories by statistical phylogenetics A note on probabilistic models over strings: the linear algebra approach Statistical alignment based on fragment insertion and deletion models Evolutionary inference via the poisson indel process Inching toward reality: an improved likelihood model of sequence evolution Models of sequence evolution for DNA sequences containing gaps Evolutionary models for insertions and deletions in a probabilistic modeling framework Probabilistic phylogenetic inference with insertions and deletions A probabilistic model for the evolution of RNA structure Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures A probabilistic model for sequence alignment with context-sensitive indels Sequence alignments and pair hidden Markov models using evolutionary history Joint Bayesian estimation of alignment and phylogeny BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny Incorporating indel information into phylogeny estimation for rapidly emerging pathogens Phylogenetic automata, pruning, and multiple alignment Hand Align: Bayesian multiple sequence alignment, phylogeny, and ancestral reconstruction A long indel model for evolutionary sequence alignment An improved model for statistical alignment Chain Monte Carlo Expectation Maximization algorithm for statistical analysis of DNA sequence evolution with neighbor-dependent substitution rates Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context Patterns of insertion and deletion in mammalian genomes Exhaustive matching of the entire protein sequence database Pattern and rate of indel evolution inferred from whole chloroplast intergenic regions in sugarcane, maize and rice Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes For indel models, this latent information must be extended to include hidden site boundaries [56] .
    keywords: alignment; distributions; evolution; finite; gap; indel; length; matrix; models; probability; sequence; state; time
       cache: cord-263987-ff6kor0c.txt
  plain text: cord-263987-ff6kor0c.txt

        item: #59 of 119
          id: cord-264135-s2u76pvk
      author: Patel, Amrutlal K.
       title: Complete genome sequence analysis of chicken astrovirus isolate from India
        date: 2016-12-23
       words: 3756
      flesch: 36
     summary: key: cord-264135-s2u76pvk authors: Patel, Amrutlal K.; Pandit, Ramesh J.; Thakkar, Jalpa R.; Hinsu, Ankit T.; Pandey, Vinod C.; Pal, Joy K.; Prajapati, Kantilal S.; Jakhesara, Subhash J.; Joshi, Chaitanya G. title: Complete genome sequence analysis of chicken astrovirus isolate from India date: 2016-12-23 journal: Vet Res Commun DOI: 10.1007/s11259-016-9673-6 sha: doc_id: 264135 cord_uid: s2u76pvk OBJECTIVE: The consensus length of 7513 bp genome sequence of Indian isolate of chicken astrovirus was obtained after assembly of 14,121 high quality reads.
    keywords: analysis; astrovirus; capsid; castv; chicken; genome; isolate; protein; sequence
       cache: cord-264135-s2u76pvk.txt
  plain text: cord-264135-s2u76pvk.txt

        item: #60 of 119
          id: cord-264296-0x90yubt
      author: Sawmya, Shashata
       title: Analyzing hCov genome sequences: Applying Machine Intelligence and beyond
        date: 2020-06-03
       words: 5017
      flesch: 56
     summary: Thus, every resulting time-step represents a date (Tk for Cluster k) and contains the clusters of genome sequences of the countries/states. Notably, we do not consider any alignmentbased method since it is not computationally feasible for us to align thousands of viral sequences for analysis and clustering purposes [4] .
    keywords: analysis; coronavirus; countries; features; genome; learning; pipeline; sars; sequences; strain; tree
       cache: cord-264296-0x90yubt.txt
  plain text: cord-264296-0x90yubt.txt

        item: #61 of 119
          id: cord-264746-gfn312aa
      author: Muse, Spencer
       title: GENOMICS AND BIOINFORMATICS
        date: 2012-03-29
       words: 10983
      flesch: 54
     summary: In addition to providing storage and retrieval of gene sequences, several of these databases also offer advanced sequence analysis methods and powerful visualization tools. However, if two or more such distantly related organisms have gene sequences that are nearly identical, a strong argument can be made that the gene is critical in both organisms and that the same function has been maintained throughout evolutionary history.
    keywords: alignment; data; database; dna; expression; figure; gene; genome; genomic; human; levels; nucleotides; number; protein; rna; sequence
       cache: cord-264746-gfn312aa.txt
  plain text: cord-264746-gfn312aa.txt

        item: #62 of 119
          id: cord-265857-fs6dj3dp
      author: Liu, Yu-Tsueng
       title: Infectious Disease Genomics
        date: 2010-12-24
       words: 4346
      flesch: 33
     summary: S-OIV emerged in the spring of 2009 in Mexico and was also discovered in specimens from two unrelated children in the San Diego area in April 2009 (CDC, 2009; Dawood et al., 2009) . S-OIV has three genome segments (HA, NP, NS) from the classic North American swine (H1N1) lineage, two segments (PB2, PA) from the North American avian lineage, one segment (PB1) from the seasonal H3N2, and most notably, two segments (NA, M) from the Eurasian swine (H1N1) lineage (Dawood et al., 2009) .
    keywords: disease; et al; genome; human; malaria; mosquito; sequence; sequencing; vaccine; vector; virus
       cache: cord-265857-fs6dj3dp.txt
  plain text: cord-265857-fs6dj3dp.txt

        item: #63 of 119
          id: cord-266288-buc4dd5y
      author: Dong, Rui
       title: A Novel Approach to Clustering Genome Sequences Using Inter-nucleotide Covariance
        date: 2019-04-09
       words: 5255
      flesch: 55
     summary: Fast algorithms for computing sequence distances by exhaustive substring composition A novel method of characterizing genetic sequences: genome space with biological distance and applications A new method to cluster genomes based on cumulative Fourier power spectrum Ecology, evolution and classification of bat coronaviruses in the aftermath of SARS A phylogenetic analysis of the Brassicales clade on an alignmet-free sequence comparison method From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison A new method to cluster DNA sequences using Fourier power spectrum Complete genome sequene of middle east respiratory syndrome Coronavirus KOR/KNIH/002_05_2016, isolated in South Korea Evolutionary and inheritance of animal mitochondrial DNA: rules and exceptions Virus classification in 60-dimensional protein space Complete genome sequence of middle east respiratory syndrome Coronavirus (MERS-CoV) from the first imported MERS-CoV case in China Mitochondrial data are not suitable for resolving placental mammals phylogeney Molecular phylogenetics and the origins of placental mammals Large-scale sequence analysis of avian influenza isolates Comparison of phylogenetic trees Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions Numerical Taxonomy The interrelationships of placental mammals and the limits of phylogenetic inference Characterization and complete genome sequence of a novel Coronavirus, Coronavirus HKU1, from patients with pneumonia Therefore, it can distinguish different sequences and classify species into correct clusters with higher accuracy and less time cost.
    keywords: dataset; method; sequence; vector; viruses
       cache: cord-266288-buc4dd5y.txt
  plain text: cord-266288-buc4dd5y.txt

        item: #64 of 119
          id: cord-266794-oyppubq5
      author: Zhang, Dachuan
       title: SARS2020: An integrated platform for identification of novel coronavirus by a consensus sequence-function model
        date: 2020-09-01
       words: 1019
      flesch: 41
     summary: For sequence function annotation, the family classification method captures common properties from the samples and extracts their feature vectors using machine learning algorithms, then merges the sequences into clusters or families. These predicted functions will provide valuable reference for further study of biological activity and pathogenesis of the 2019-nCoV. We built an integrated platform to assist 2019-nCoV research, and we proposed a novel consensus sequence-function model for using genome sequence data to identify unknown species.
    keywords: 2019; function; ncov; sequence
       cache: cord-266794-oyppubq5.txt
  plain text: cord-266794-oyppubq5.txt

        item: #65 of 119
          id: cord-266960-kyx6xhvj
      author: Temple, Mark D.
       title: Real-time audio and visual display of the Coronavirus genome
        date: 2020-10-02
       words: 6781
      flesch: 52
     summary: This paper demonstrates that sonification of RNA sequence data may also be useful to understand how the genome functions. During this time a large body of evidence has arisen regarding RNA sequence homology to other SARS like virus strains
    keywords: audio; data; display; genome; reading; region; rna; sequence; sonification; transcription; translation
       cache: cord-266960-kyx6xhvj.txt
  plain text: cord-266960-kyx6xhvj.txt

        item: #66 of 119
          id: cord-267500-x3u9i1vq
      author: Rose, Rebecca
       title: Challenges in the analysis of viral metagenomes
        date: 2016-08-03
       words: 5929
      flesch: 26
     summary: Automatic pipelines which combine various homology search strategies to identify a final set of viral reads include VirusHunter (Zhao et al. 2013) , a Perl script that automates viral identification using BLAST prior to assembly; MetaVir (Roux et al. 2011) , a web application that compares users' datasets to published viral sequences; and VirSorter (Roux et al. 2015) , which identifies prophages and viruses by comparison with custom datasets. Various software tools have been developed to accommodate the unique challenges and use cases associated with characterizing viral sequences; however, the quality of these tools varies, and their use often necessitates computing expertise or access to powerful computers, thus limiting their usefulness to many researchers.
    keywords: analysis; approaches; assembly; data; et al; genomes; graph; novo; reads; sequences; sequencing; tools
       cache: cord-267500-x3u9i1vq.txt
  plain text: cord-267500-x3u9i1vq.txt

        item: #67 of 119
          id: cord-268467-btfz6ye8
      author: Schreiber, Steven S.
       title: Sequence analysis of the nucleocapsid protein gene of human coronavirus 229E
        date: 1989-03-31
       words: 5049
      flesch: 51
     summary: RNAGenetits Characterization of leader RNA sequences on the virion and mRNAs of mouse hepatitis virus, a cytoplasmic virus Mouse hepatitis virus A59: The mRNAs of coronaviruses contain a stretch of leader sequence which is derived from the 5'-end of the viral genome and exhibits homologywith the intergenic consensus sequence Budzilowicz et al., 1985) .
    keywords: a/.; coronaviruses; hcv-229e; human; leader; mrna; nucleocapsid; protein; rna; sequence; virus
       cache: cord-268467-btfz6ye8.txt
  plain text: cord-268467-btfz6ye8.txt

        item: #68 of 119
          id: cord-268549-2lg8i9r1
      author: Dai, Qi
       title: Sequence comparison via polar coordinates representation and curve tree
        date: 2012-01-07
       words: 4368
      flesch: 49
     summary: At the same time, if the value of o is too large, the curvature difference on the small-scale will be covered that is not good for sequence representation either. The curve tree was then constructed to numerically characterize the closed curve of biological sequences, and further compared biological sequences by evaluating the distance of the curve tree of the query sequence matching against a corresponding curve tree of the template sequence.
    keywords: curve; dna; et al; randic; representation; sequences; tree
       cache: cord-268549-2lg8i9r1.txt
  plain text: cord-268549-2lg8i9r1.txt

        item: #69 of 119
          id: cord-274056-9t3kneoo
      author: Abd Elwahaab, Marwa A.
       title: A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector
        date: 2019-05-08
       words: 3315
      flesch: 55
     summary: 2D and 3D amino acid adjacency matrices A new method to analyze protein sequence similarity using dynamic time warping A 2D graphical representation of protein sequence and its numerical characterization Graphical representation and similarity analysis of protein sequences based on fractal interpolation ADLD: a novel graphical representation of protein sequences and its application Comparative analysis of protein primary sequences with graph energy UC-curve: a highly compact 2D graphical representation of protein sequences The graphical representation of protein sequences based on the physicochemical properties and its applications F-curve, a graphical representation of protein sequences for similarity analysis based on physicochemical properties of amino acids A novel method of 2D graphical representation for proteins and its application 3D graphical representation of protein sequences and their statistical characterization Novel numerical characterization of protein sequences based on individual amino acid and its application Similarities/dissimilarities analysis of protein sequences based on PCA-FFT On novel representation of proteins based on amino acid adjacency matrix A sequence-segmented method applied to the similarity analysis of long protein sequence It is a figure which summarizes our approach. In our work, a representative of each of three groups of protein sequences is introduced.
    keywords: dissimilarity; group; protein; sequences; similarity; vector
       cache: cord-274056-9t3kneoo.txt
  plain text: cord-274056-9t3kneoo.txt

        item: #70 of 119
          id: cord-275258-azpg5yrh
      author: Mead, Dylan J.T.
       title: Visualization of protein sequence space with force-directed graphs, and their application to the choice of target-template pairs for homology modelling
        date: 2019-07-26
       words: 6335
      flesch: 48
     summary: As the taxonomical distance increases, production of high quality homology models becomes more difficult. Human-infective virus Importance to human health NCBI RefSeq annotated genome Easy retrieval of high quality RdRP sequence RdRP located at the 3 0 end of polyprotein or on its own segment Eliminates unconventional RdRPs
    keywords: genus; homology; modelling; models; quality; rdrp; sequence; structure; table; target; template
       cache: cord-275258-azpg5yrh.txt
  plain text: cord-275258-azpg5yrh.txt

        item: #71 of 119
          id: cord-279528-41atidai
      author: Abo-Elkhier, Mervat M.
       title: Measuring Similarity among Protein Sequences Using a New Descriptor
        date: 2019-11-22
       words: 3048
      flesch: 52
     summary: The graphical representation of protein sequence is a simple way to visualize protein sequences. Basic local alignment search tool Gapped BLAST and PSI-BLAST: a new generation of protein database search programs CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Graphical representation of proteins Similarity/dissimilarity calculation methods of DNA sequences: a survey Highly compact 2D graphical representation of DNA sequences, SAR and QSAR Unique graphical representation of protein sequences based on nucleotide triplet codons Novel 2-D graphical representation of proteins Representation of protein sequences on latitude-like circles and longitude-like semi-circles On a geometry-based approach to protein sequence alignment DNA sequence comparison by a novel probabilistic method 2-D Graphical representation of proteins based on physico-chemical properties of amino acids A 2D graphical representation of protein sequence and its numerical characterization 3-D maps and coupling numbers for protein sequences 3D graphical representation of protein sequences and their statistical characterization DNA sequence representation without degeneracy Protein map: an alignment-free sequence comparison method based on various properties of amino acids On novel representation of proteins based on amino acid adjacency matrix Protein alignment: exact versus approximate.
    keywords: amino; protein; representation; sequences; similarity; table
       cache: cord-279528-41atidai.txt
  plain text: cord-279528-41atidai.txt

        item: #72 of 119
          id: cord-280881-5o38ihe0
      author: Wlodawer, Alexander
       title: A model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases
        date: 2003-11-11
       words: 4872
      flesch: 47
     summary: A homology-derived model of human CLN2 Figure 5 A homology-derived model of human CLN2. One of the symptoms of the disease is the accumulation of an autofluorescent material, ceroid-lipofuscin, in lysosomal storage bodies in various cell types, primarily in the nerv-A model of the active site of human CLN2 Figure 6 A model of the active site of human CLN2.
    keywords: cln2; conserved; enzymes; figure; human; kumamolisin; model; residues; sedolisin; sequence
       cache: cord-280881-5o38ihe0.txt
  plain text: cord-280881-5o38ihe0.txt

        item: #73 of 119
          id: cord-287634-64zqe4cz
      author: Al-Ssulami, Abdulrakeeb M.
       title: CodSeqGen: A tool for generating synonymous coding sequences with desired GC-contents
        date: 2020-01-31
       words: 2307
      flesch: 54
     summary: In this paper, we present an algorithmic solution to produce coding sequences that follow exactly a primary amino acid sequence and a desired GC-content. Although, these tools generate random DNA and coding sequences, none of them are capable of producing coding sequences given the amino acid sequence and GC-content.
    keywords: amino; content; sequences
       cache: cord-287634-64zqe4cz.txt
  plain text: cord-287634-64zqe4cz.txt

        item: #74 of 119
          id: cord-287658-c2lljdi7
      author: Lopez-Rincon, Alejandro
       title: Classification and Specific Primer Design for Accurate Detection of SARS-CoV-2 Using Deep Learning
        date: 2020-09-10
       words: 4786
      flesch: 46
     summary: These methods rely on the assumption that cDNA sequences share common features, and their order prevails among different sequences 19, 20 . We then validate the discovered sequences on datasets not used during the training of the CNN, and show how to exploit them to create a novel, highly informative set of sequence features (e.g. viral sequences).
    keywords: bps; coronavirus; cov-2; data; learning; primer; samples; sars; sequences; set; virus
       cache: cord-287658-c2lljdi7.txt
  plain text: cord-287658-c2lljdi7.txt

        item: #75 of 119
          id: cord-291156-zxg3dsm3
      author: Bernasconi, Anna
       title: Empowering Virus Sequences Research through Conceptual Modeling
        date: 2020-05-01
       words: 4605
      flesch: 34
     summary: The manuscript is organized as follows: Section 2 overviews current technologies available for virus sequence data management. Many other resources link to viral sequence data, including: drug databases, particularly interesting as they provide information about clinical studies (see ClinicalTrials 10 ), protein sequences databases (e.g., UniProtKB/Swiss-Prot [32] ), and cell lines databases (e.g., Cellosaurus [3] ).
    keywords: cov2; covid-19; data; database; entity; genomic; information; model; sars; sequence; vcm; virus
       cache: cord-291156-zxg3dsm3.txt
  plain text: cord-291156-zxg3dsm3.txt

        item: #76 of 119
          id: cord-296691-cg463fbn
      author: Wang, Ren
       title: De novo Sequence Assembly and Characterization of Lycoris aurea Transcriptome Using GS FLX Titanium Platform of 454 Pyrosequencing
        date: 2013-04-09
       words: 5838
      flesch: 41
     summary: Hence, determination of the genetic pathways and specific genes involved in Amaryllidaceae alkaloids biosynthesis and some other aspects of Lycoris could be beneficial for humans and enrich our knowledge and understanding of functional genomics and biological research. For the purpose of improving mRNA abundance of genes related to Amaryllidaceae alkaloids biosynthesis, the leaves were treated with those abiotic elicitors for RNA extraction.
    keywords: alkaloids; amaryllidaceae; analysis; aurea; biosynthesis; cdna; galanthamine; genes; lycoris; molecular; sequences; sequencing; species; total; transcriptome
       cache: cord-296691-cg463fbn.txt
  plain text: cord-296691-cg463fbn.txt

        item: #77 of 119
          id: cord-300149-djclli8n
      author: Ruan, Yijun
       title: Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection
        date: 2003-05-24
       words: 4358
      flesch: 47
     summary: We compared sequence data generated from the library with human, mouse, and viral genome databases managed at the US National Center for Biotechnology The basic local alignment search tool is a system for searching similar sequences against all available sequence databases irrespective of whether the query is DNA or protein sequences. Associations between the members of the coronaviridae family to the SARS virus were assessed by comparing overlapping fragments of the SIN2500 genomic sequence against a database of coronavirus sequences.
    keywords: analysis; coronavirus; cov; genome; hotel; isolates; protein; rna; sars; sequence; singapore; spike
       cache: cord-300149-djclli8n.txt
  plain text: cord-300149-djclli8n.txt

        item: #78 of 119
          id: cord-300796-rmjv56ia
      author: None
       title: The signal sequence of the p62 protein of Semliki Forest virus is involved in initiation but not in completing chain translocation
        date: 1990-09-01
       words: 8108
      flesch: 49
     summary: However, the typical cytoplasmic orientation of the NH2-termini of membrane protein chains carrying a combined signal sequence-anchoring peptide suggests that signal sequences in general might direct their function in translocation through the insertion of their hydrophobic and uncharged stretch of amino acid residues into the membrane in such an orientation that the NHEterminus of the signal remains on the outside of the ER mem- The possibility that our results about p62 protein translocation would be unique to the viral system .and different from the general translocation process in the ER we find most unlikely.
    keywords: chain; dhfr; et al; fig; glycosylation; membrane; p62; p62 protein; p62 signal; protein; region; sequence; signal; signal sequence; time; translocation
       cache: cord-300796-rmjv56ia.txt
  plain text: cord-300796-rmjv56ia.txt

        item: #79 of 119
          id: cord-300807-9u8idlon
      author: Tong, Joo Chuan
       title: 7 Infectious disease informatics
        date: 2013-12-31
       words: 2437
      flesch: 47
     summary: In cases where the ancestry is unclear, sequence alignment methods can be used to infer their phylogenetic relationships. Upcoming challenges for multiple sequence alignment methods in the high-throughput era Founder effects in the assessment of HIV polymorphisms and HLA allele associations Prediction and entropy of printed English HLA class I restriction as a possible driving force for Chikungunya evolution Complete-proteome mapping of human infl uenza
    keywords: acid; amino; diseases; selection; sequences; sites; substitution
       cache: cord-300807-9u8idlon.txt
  plain text: cord-300807-9u8idlon.txt

        item: #80 of 119
          id: cord-301827-a7hnuxy5
      author: Uversky, Vladimir N
       title: A decade and a half of protein intrinsic disorder: Biology still waits for physics
        date: 2013-04-29
       words: 20990
      flesch: 37
     summary: Why these proteins are intrinsically disordered Caseins as rheomorphic proteins: interpretation of primary and secondary structures of the as1-, b-, and k-caseins The relation of polypeptide hormone structure and flexibility to receptor binding: the relevance of X-ray studies on insulins, glucagon and human placental lactogen High-resolution proton-magnetic-resonance studies of chromatin core particles Protein structure and enzyme activity Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for beta-structure NACP, a protein implicated in Alzheimer's disease and learning, is natively unfolded Protein structure protection commits gene expression patterns A protein-chameleon: conformational plasticity of alpha-synuclein, a disordered protein involved in neurodegenerative disorders Malleable machines take shape in eukaryotic transcriptional regulation Operational definition of intrinsically unstructured protein sequences based on susceptibility to the 20S proteasome Drugs for 'protein clouds': targeting intrinsically disordered transcription factors Protein dynamics: dancing on an ever-changing free energy stage Protein flexibility, not disorder, is intrinsic to molecular recognition TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder Intrinsic disorder and functional proteomics Sequence complexity of disordered protein Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization The protein non-folding problem: amino acid determinants of intrinsic order and disorder Composition Profiler: a tool for discovery and visualization of amino acid composition differences Comparing predictors of disordered protein A practical overview of protein disorder prediction methods Predicting protein disorder and induced folding: from theoretical principles to practical applications Prediction of protein disorder at the domain level Prediction of protein disorder Predicting intrinsic disorder in proteins: an overview Inherent relationships among different biophysical prediction methods for intrinsically disordered proteins Intrinsic protein disorder in complete genomes Prediction and functional analysis of native disorder in proteins from the three kingdoms of life The mysterious unfoldome: structureless, underappreciated, yet vital part of any given proteome Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life Thousands of proteins likely to have long disordered regions Norton RS (2006) This study showed that the fraction of protein disorder was positively correlated with both measured RNA expression levels of E. coli genes in three different growth media and with predicted abundance levels of E. coli proteins.
    keywords: acid; amino; analysis; binding; cell; complex; diseases; disorder; disordered proteins; disordered regions; domains; evolution; fact; folding; function; idps; interactions; membrane; molten; p53; partners; protein structure; proteins; regions; regulation; residues; sequence; signaling; state; structure
       cache: cord-301827-a7hnuxy5.txt
  plain text: cord-301827-a7hnuxy5.txt

        item: #81 of 119
          id: cord-302161-ytr7ds8i
      author: Lutz, Mirjam
       title: FCoV Viral Sequences of Systemically Infected Healthy Cats Lack Gene Mutations Previously Linked to the Development of FIP
        date: 2020-07-24
       words: 9917
      flesch: 50
     summary: To track viral sequence mutations in organs of healthy FCoV carrier cats, we investigated FCoV sequences detected in the colon, liver, and thymus, as well as feces of seven experimentally FCoV infected cats. The overall comparison of FCoV gene sequences from the different cats euthanized at different time points after infection did not reveal any significant differences (Table 3) .
    keywords: cats; challenge; fcov; fecal; feline; fip; gene; infection; mutations; samples; sequences; study; tissue; virus
       cache: cord-302161-ytr7ds8i.txt
  plain text: cord-302161-ytr7ds8i.txt

        item: #82 of 119
          id: cord-302798-q0mbngqy
      author: Ge, Junwei
       title: Genomic characterization of circoviruses associated with acute gastroenteritis in minks in northeastern China
        date: 2018-06-14
       words: 4347
      flesch: 51
     summary: The examination of other MiCV sequences from different regions will help to assess the level of genetic diversity. Other sequences were obtained from GenBank; accession numbers of those sequences are included in the tree to our knowledge of the pathogenic potential of MiCV and its association with mink enteritis if our results were corroborated by further reports.
    keywords: amino; analysis; batcv; circovirus; cvs; genome; micv; mink; nucleotide; sequence; tac; tat
       cache: cord-302798-q0mbngqy.txt
  plain text: cord-302798-q0mbngqy.txt

        item: #83 of 119
          id: cord-304607-td0776wj
      author: Paszkiewicz, Konrad H.
       title: Omics, Bioinformatics, and Infectious Disease Research
        date: 2010-12-24
       words: 7023
      flesch: 39
     summary: In addition, 21 nonannotated regions had clear levels of transcription and should therefore be considered as genes (Passalacqua et al., 2009) . Indeed, the first bacterial genomes sequenced were those from pathogens Fraser et al., 1995; Tomb et al., 1997) , and these were preceded by many bacteriophage genomes such as bacteriophage MS2 (Fiers et al., 1976) and ϕX174 (Sanger et al., 1977) and viral genomes (Fiers et al., 1978) .
    keywords: analysis; assembly; bioinformatics; data; disease; et al; genes; genome; genomics; proteins; sequence; sequencing; species; vaccine
       cache: cord-304607-td0776wj.txt
  plain text: cord-304607-td0776wj.txt

        item: #84 of 119
          id: cord-304869-l6a68tqn
      author: Bielińska-Wąż, Dorota
       title: Graphical and numerical representations of DNA sequences: statistical aspects of similarity
        date: 2011-08-28
       words: 15415
      flesch: 58
     summary: Though q may be easily increased up to higher-orders, as we shall see, the information about similarity sequences is specific enough up to the fourth order. Two bases belonging to different sequences, both located on the p-th positions are represented by a pair of numbers, {x p , n p }.
    keywords: alignment; bases; descriptors; dna sequences; example; fig; graphs; methods; representation; sequences; similarity; table
       cache: cord-304869-l6a68tqn.txt
  plain text: cord-304869-l6a68tqn.txt

        item: #85 of 119
          id: cord-306725-0vam15pt
      author: Li, Hao
       title: First detection and genomic characteristics of bovine torovirus in dairy calves in China
        date: 2020-05-09
       words: 3021
      flesch: 55
     summary: Nucleotide and deduced amino acid sequences were compared using the MegAlign program of Lasergene software, version 7.1 (DNASTAR, Madison, WI, USA). In this research, we determined the obtained two complete genome sequences of two BToV isolates from the same farm in Sichuan province, increasing the number of BToV genome sequences in the GenBank database to five, thus contributing to a better understanding of the genome structure and genetic evolution of BToV. Phylogenetic analysis indicated that these two BToV isolates had a close genetic relationship to strains from Japan.
    keywords: acid; amino; bovine; btov; complete; sequences; strains; torovirus
       cache: cord-306725-0vam15pt.txt
  plain text: cord-306725-0vam15pt.txt

        item: #86 of 119
          id: cord-310734-6v7oru2l
      author: Bolatti, Elisa M.
       title: A Preliminary Study of the Virome of the South American Free-Tailed Bats (Tadarida brasiliensis) and Identification of Two Novel Mammalian Viruses
        date: 2020-04-09
       words: 8482
      flesch: 37
     summary: Ubiquitous Viruses With Small Genomes and a Diverse Host Range Determination of the origin cleavage and joining domain of geminivirus Rep proteins Identification of the nicking tyrosine of geminivirus Rep protein A single rep protein initiates replication of multiple genome components of faba bean necrotic yellows virus, a single-stranded DNA virus of plants Geminivirus replication proteins are related to prokaryotic plasmid rolling circle DNA replication initiator proteins Conserved sequence and structural motifs contribute to the DNA binding and cleavage activities of a geminivirus replication protein Functional analysis of a novel motif conserved across geminivirus Rep proteins A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication The oligomeric Rep protein of Mungbean yellow mosaic India virus (MYMIV) is a likely replicative helicase DNA Helicase Activity Is Associated with the Replication Initiator Protein Rep of Tomato Yellow Leaf Curl Geminivirus Contaminating viral sequences in high-throughput sequencing viromics: A linkage study of 700 sequencing libraries Clinical Metagenomic Next-Generation Sequencing for Pathogen Detection Development and Optimization of Metagenomic Next-Generation Sequencing Methods for Cerebrospinal Fluid Diagnostics Quality control implementation for universal characterization of DNA and RNA viruses in clinical respiratory samples using single metagenomic next-generation sequencing workflow Metagenomic Analysis of Viruses from Bat Fecal Samples Reveals Many Novel Viruses in Insectivorous Bats in China Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery Limited reverse transcriptase activity of phi29 DNA polymerase Deciphering the bat virome catalog to better understand the ecological diversity of bat viruses and the bat origin of emerging infectious diseases High diversity of rabies viruses associated with insectivorous bats in Argentina: The analysis also identified (although in low counts) viral sequences related to the family Alloherpesviridae, which infects fish and amphibians.
    keywords: analysis; bat; bats; brasiliensis; contigs; dna; families; gene; genome; metagenomic; novel; pairs; protein; read; rep; samples; sequence; species; tbrapv1; viruses
       cache: cord-310734-6v7oru2l.txt
  plain text: cord-310734-6v7oru2l.txt

        item: #87 of 119
          id: cord-311240-o0zyt2vb
      author: Motayo, Babatunde Olarenwaju
       title: Evolution and Genetic Diversity of SARSCoV-2 in Africa Using Whole Genome Sequences
        date: 2020-07-27
       words: 3104
      flesch: 42
     summary: There has been paucity of data on the genetic evolution of SARSCoV-2 sequences from Africa, despite the increasing number of genome sequence submissions into the GISAID database from Africa; there were 97 whole genome sequences available in the GISAID database as at 24 th April 2020. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes.
    keywords: africa; analysis; et al; genome; sarscov-2; sequences; virus
       cache: cord-311240-o0zyt2vb.txt
  plain text: cord-311240-o0zyt2vb.txt

        item: #88 of 119
          id: cord-311839-61djk4bs
      author: Wei, Dan
       title: A novel hierarchical clustering algorithm for gene sequences
        date: 2012-07-23
       words: 8046
      flesch: 58
     summary: Major algorithms used in gene sequence clustering can be divided into two categories according to the result format: hierarchical clustering algorithms and partitional clustering algorithms We have applied mBKM with DMk in clustering gene sequences and performing phylogenetic analysis.
    keywords: alignment; clustering; data; distance; dmk; mbkm; measure; method; number; sequences; tuple
       cache: cord-311839-61djk4bs.txt
  plain text: cord-311839-61djk4bs.txt

        item: #89 of 119
          id: cord-321150-ev6acl7b
      author: Lam, Ha Minh
       title: Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm
        date: 2017-10-03
       words: 3189
      flesch: 44
     summary: To illustrate improved runtimes and memory usage of the new 3SEQ algorithm, we searched for recombinants among large sequence data sets of dengue virus serotype 2, Ebola virus, the coronavirus responsible for Middle-East Respiratory Syndrome (MERS) and Zika virus; see table 1. Ebola virus sequences were restricted to human viruses sampled in Africa after December 1, 2013.
    keywords: recombination; sequence; sites; virus
       cache: cord-321150-ev6acl7b.txt
  plain text: cord-321150-ev6acl7b.txt

        item: #90 of 119
          id: cord-321386-u1imic5l
      author: Li, Chun
       title: Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation
        date: 2018-02-17
       words: 5524
      flesch: 56
     summary: The results illustrated the better performance of our method. Identification of DNA-binding proteins using support vector machines and evolutionary profiles DNA-prot: identification of DNA binding proteins from protein sequence information using random forest iDNA-prot: identification of DNA binding proteins using random forest with grey model enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning gDNA-Prot: predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of Protein sequence Numerical characterization of protein sequences based on the generalized Chou's pseudo amino acid composition Light-directed synthesis of peptide nucleic acids (PNAs) chips Protein structure prediction from sequence variation Principles that govern the folding of protein chains Prediction of protein cellular attributes using pseudoamino acid composition Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions Identify recombination spots with pseudo dinucleotide composition Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM Protein sequence comparison based on physicochemical properties and the position-feature energy matrix A Novel protein characterization based on pseudo amino acids composition and star-like graph topological indices Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses A computational approach to simplifying the protein folding problem Modeling study on the validity of a possibly simplified representation of proteins 2-D graphical representation of protein sequences and its application to coronavirus phylogeny Clustering of the protein design alphabets by using hierarchical self-organizing map A novel descriptor of protein sequences and its application BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences Amino acid difference formula to help explain protein Correlation analysis of some physical chemistry properties among genetic codons and amino acids Similarity analysis of protein sequences based on the normalized relative entropy On 3-D graphical representation of DNA primary sequences and their numerical characterization Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation Milestones in graphical bioinformatics Graphical representation of proteins Representation of proteins as walks in 20-D space Phylogenetic analysis of DNA sequences based on k-word and rough set theory On the characterization of DNA primary sequences by triplet of nucleic acid bases DV-Curve: A novel intuitive tool for visualizing and analyzing DNA sequences A Novel method for similarity analysis and protein sub-cellular localization prediction The Zagreb indices 30 years after On vertex-degree-based molecular structure descriptors Graphs with fixed number of pendent vertices and minimal Zeroth-order general Randic index New invariant of DNA sequences Genetic drift of human coronavirus OC43 spike gene during adaptive evolution WHO MERS-CoV global summary and risk assessment Assessing the accuracy of prediction algorithms for classification: an overview iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition Using deformation energy to analyze nucleosome positioning in genomes iRNA-PseU: identifying RNA pseudouridine sites Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve Using a Euclid distance discriminant method to find protein coding genes in the yeast genome The authors' greatest gratitude goes to the anonymous referees for their insightful suggestions and generous support. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information.
    keywords: acids; amino; dataset; dna; group; matrix; method; model; protein; sequence; vector
       cache: cord-321386-u1imic5l.txt
  plain text: cord-321386-u1imic5l.txt

        item: #91 of 119
          id: cord-321715-bkfkmtld
      author: Redelings, Benjamin D
       title: Incorporating indel information into phylogeny estimation for rapidly emerging pathogens
        date: 2007-03-14
       words: 9797
      flesch: 50
     summary: The order of sequence alignment can bias the selection of tree topology An evolutionary model for maximum likelihood alignment of DNA sequences Inching towards reality: an improved likelihood model of sequence evolution Joint Bayesian Estimation of Alignment and Phylogeny A codon-based model of nucleotide substitution for protein-coding DNA sequences Mathematical and Statistical Methods for Genetic Analysis Subtree Transfer Operations and their Induced Metrics on Evolutionary Trees Monte Carlo Strategies in Scientific Computing A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution Dating of the human-ape splitting by a molecular clock of mitochondrial DNA Wain-Hobson S: Antigenic Stimulation by BGC vaccine as an in vivo driving force for SIV replication and dissemination Evolution of a Noncoding Region of the Chloroplast Genome Gaps as characters in sequencebased phylogenetic analyses Incorporating information from length-mutational events into phylogenetic analysis The evolution of the non-coding Chloroplast DNA and its application in Plant Systematics Indel patterns of the plastid DNA trnL-trnF region within the genus Poa (Poaceae) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice MUSCLE: multiple sequence alignment with high accuracy and high throughput We would like to thank Vladimir Minin for many helpful discussions. A major advantage of this symmetry is that it is clear how to construct alignment models on an unrooted tree and leads to greater simplicity in model implementation and, arguably, decreased computation time.
    keywords: alignment; branch; codon; data; distribution; indel; information; joint; length; model; number; sequence; set; tree
       cache: cord-321715-bkfkmtld.txt
  plain text: cord-321715-bkfkmtld.txt

        item: #92 of 119
          id: cord-321762-7kiahjyy
      author: Nandy, Ashesh
       title: Chapter 5 The GRANCH Techniques for Analysis of DNA, RNA and Protein Sequences
        date: 2015-12-31
       words: 9799
      flesch: 43
     summary: Developments in the graphical representation and numerical characterization of DNA sequences raised the possibilities of using similar analysis of protein sequences, albeit with difficulty arising from the fact that now we have to contend with 20 amino acids making up a protein chain whereas DNA sequences were made up of only four nucleotides. Paper presented at the Indo-US Workshop on Mathematical Chemistry Indexing scheme and similarity measures for macromolecular sequences On 3-D representation of DNA primary sequences Novel analysis of DNA and Protein sequences through Graphical Representation and Numerical Characterization techniques Novel Techniques of Graphical Representation and Analysis of DNA Sequences -A Review Visualization and analysis of DNA sequences using DNA walks Mathematical descriptors of DNA sequences: development and applications New Approaches to Drug-DNA Interactions Based on Graphical Representation and Numerical Characterization of DNA Sequences Graphical representation and mathematical characterization of protein sequences and applications to viral proteins DNA Sequence Visualization Charcaterizations of DNA Primary Sequences Molecular Descriptors for Chemoinformatics, Methods and Principles in Medicinal Chemistry Genome analysis: A new approach for visualisation of sequence organisation in genomes Mathematicalc haracterisationo f chaos, game representation: New algorithms for nucleotide sequence analysis Chaos game representation of similarities and differences between genomic sequences H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences Random walk and gap plots of DNA sequences Graphical analysis of DNA sequence structure: III.
    keywords: analysis; bases; descriptors; dna; dna sequences; et al; gene; graphical; method; numerical; protein; protein sequences; representation; sequences
       cache: cord-321762-7kiahjyy.txt
  plain text: cord-321762-7kiahjyy.txt

        item: #93 of 119
          id: cord-324021-y1vr1db0
      author: Kozak, M.
       title: Determinants of translational fidelity and efficiency in vertebrate mRNAs
        date: 1994-12-31
       words: 5083
      flesch: 38
     summary: The scanning model for translation: an update A consideration of alternative models for the initiation of translation in eukaryotes Thyroid hormone receptor transcriptional activity is potentially autoregulated by truncated forms of the receptor Tracheal U (1992) N-terminal truncation of salmon calcitonin leads to calcitonin antagonists Mutation eliminating mitochondrial leader sequence of methylmalonyI-CoA mutase causes tlWI ° methyl-malonic acidemia Translation of insulin-r~lated polypeptides from messenger RNAs with tandemly reiterated copies of the ribosome binding site Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukm'yotic ribosomes At least six nuclcotides preceding the AUG initiator codon enhance translation in mammalian cells An analysis of 5' non-coding sequences from 699 vertebrate messenger RNAs Context effects and (inefficient) initiation at non-AUG codons in eukaryotic cell-free translation systems Expression of bacterial chitinase protein in tobacco leaves using two photosynthetic 8ene promoters Cavener DR (1991) Translation initiation in DnJ,wq~hila mehmogaster is reduced by mutations upstream of the AUG initiator codon Mutational analysis of the HIS4 translational initiator region in Sarrhammyres (,erevisiae Influence of the three nucleotides upstream of the initiation codon on expression of the E colt lacZ gene in S curevisiae. structural protein initiation codons: effects on regulation of synthesis and biological activity Human gene mutations affecting RNA processing and translation An initiation codon mutation in CDI8 in association with the moderate pbenotype of leukocyte adhesion deficiency Enhanced translational efficiency of a novel transforming growth factor [$3 mRNA in human breast cancer cells Effect of growth hormone on levels of differentially processed IGF-!
    keywords: aug; aug codon; codon; context; initiation; leader; mrna; non; protein; sequence; structure; translation
       cache: cord-324021-y1vr1db0.txt
  plain text: cord-324021-y1vr1db0.txt

        item: #94 of 119
          id: cord-324216-ce3wa889
      author: Wang, Zheng
       title: Resequencing microarray probe design for typing genetically diverse viruses: human rhinoviruses and enteroviruses
        date: 2008-12-01
       words: 5213
      flesch: 45
     summary: The limited number of HRV sequences available in GenBank during the time of design of RPM-Flu v.30/31 rendered a few of the targets represented on RPM-Flu v.30/31 are shorter than 200 bp. A minimal number of probe sequences (26 for HRV and 13 for HEV), which were potentially capable of detecting all serotypes of HRV and HEV, were determined and implemented on the Resequencing Pathogen Microarray RPM-Flu v.30/31 (Tessarae RPM-Flu).
    keywords: base; design; hev; hrv; microarray; prototype; resequencing; rpm; sequences; serotypes; strains
       cache: cord-324216-ce3wa889.txt
  plain text: cord-324216-ce3wa889.txt

        item: #95 of 119
          id: cord-325043-vqjhiv7p
      author: Gorbalenya, Alexander E.
       title: An NTP-binding motif is the most conserved sequence in a highly diverged monophyletic group of proteins involved in positive strand RNA viral replication
        date: 1989
       words: 6807
      flesch: 42
     summary: In fact, in recent studies, protein sequences were searched for the A consensus alone as the B consensus in its loosest form is obviously too degenerate to be unequivocally recognized, except in a family of diverged proteins (see below). Protein sequences were extracted from the current literature (for references see Table 1 ).
    keywords: consensus; et al; families; family; motif; ntp; proteins; residues; rna; sequence; viruses
       cache: cord-325043-vqjhiv7p.txt
  plain text: cord-325043-vqjhiv7p.txt

        item: #96 of 119
          id: cord-325750-x7jpsnxg
      author: Mokili, John L
       title: Metagenomics and future perspectives in virus discovery
        date: 2012-01-20
       words: 8747
      flesch: 36
     summary: No association of xenotropic murine leukemia virus-related viruses with prostate cancer Reliability and reproducibility issues in DNA microarray measurements Efficient isolation of genes differentially expressed on cellulose by suppression subtractive hybridization in Agaricus bisporus Virus discovery by sequenceindependent genome amplification Suppression subtraction hybridization (SSH) and macroarray techniques reveal differential gene expression profiles in brain of sea bream infected with nodavirus Suppression subtractive hybridization: a versatile method for identifying differentially expressed genes Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma A novel DNA virus (TTV) associated with elevated transaminase levels in posttransfusion hepatitis of unknown etiology Identification of two flavivirus-like genomes in the GB hepatitis agent STAT1-dependent innate immunity to a Norwalk-like virus Sequence-independent, single-primer amplification (SISPA) of complex DNA populations Metagenomics and the molecular identification of novel viruses Viruses in the faecal microbiota of monozygotic twins and their mothers Hepatitis E virus (HEV): the novel agent responsible for enterically transmitted non-A, non-B hepatitis The isolation and characterization of a Norwalk virus-specific cDNA Identification of a novel astrovirus (astrovirus VA1) associated with an outbreak of acute gastroenteritis Detection of a novel astrovirus in brain tissue of mink suffering from shaking mink syndrome by use of viral metagenomics A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species Laboratory procedures to generate viral metagenomes An excellent compilation of standard operating procedures to perform metagenomic analysis on different types of samples The marine viromes of four oceanic regions Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes Multiple diverse circoviruses infect farm animals and are commonly found in human and chimpanzee feces Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses Viral diversity and dynamics in an infant gut RNA viral community in human feces: prevalence of plant pathogenic viruses Viral communities associated with healthy and bleaching corals Metagenomic analysis of stressed coral holobionts Assembly of viral metagenomes from yellowstone hot springs Using pyrosequencing to shed light on deep mine microbial ecology Microbes and health sackler colloquium: metagenomic detection of phage-encoded platelet-binding factors in the human oral cavity Extraction of high molecular weight genomic DNA from soils and sediments Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification Assessment of whole genome amplification-induced bias through highthroughput, massively parallel whole genome sequencing Whole transcriptome amplification for gene expression profiling and development of molecular archives Single virus genomics: a new tool for virus discovery Flow cytometric detection of viruses DNA sequencing with chainterminating inhibitors Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses Arbovirus detection in insect vectors by rapid, highthroughput pyrosequencing Isolation and characterization of Solenopsis invicta virus 3, a new positive-strand RNA virus infecting the red imported fire ant, Solenopsis invicta A new arenavirus in a cluster of fatal transplant-associated diseases Genomic and phylogenetic characterization of Merino Walk virus, a novel arenavirus isolated in South Africa Parallel tagged sequencing on the 454 platform Targeted high-throughput sequencing of tagged nucleic acid samples The history of pyrosequencing A new method of sequencing DNA The not so universal tree of life or the place of viruses in the living world Reasons to include viruses in the tree of life Viral genomes are part of the phylogenetic tree of life There is no such thing as a tree of life (and of course viruses are out!) In this article, we review virus discovery techniques with a focus on metagenomic approaches that employ high-throughput sequencing technologies to characterize novel viruses.
    keywords: analysis; approach; characterization; culture; discovery; disease; dna; human; identification; koch; metagenomic; methods; molecular; novel; samples; sequence; sequencing; virus discovery; viruses
       cache: cord-325750-x7jpsnxg.txt
  plain text: cord-325750-x7jpsnxg.txt

        item: #97 of 119
          id: cord-325985-xfzhn1n1
      author: Jabado, Omar J.
       title: Comprehensive viral oligonucleotide probe design using conserved protein regions
        date: 2007-12-13
       words: 4266
      flesch: 41
     summary: All four subtypes were subjected to the same three step design method: identification of conserved regions, extraction of nucleotide probe sequences, and minimization of covering probes. All probe sequences were compared to the non-redundant set of viral sequences by BLASTN (37) .
    keywords: database; design; method; motif; nucleic; pfam; probe; protein; sequences; viral; virus
       cache: cord-325985-xfzhn1n1.txt
  plain text: cord-325985-xfzhn1n1.txt

        item: #98 of 119
          id: cord-326225-crtpzad7
      author: Neill, John D.
       title: Simultaneous rapid sequencing of multiple RNA virus genomes
        date: 2014-06-01
       words: 3807
      flesch: 49
     summary: These include methodologies based on PCR amplification of viral sequences, both in fragments (Rao et al., 2013) or fulllength genome amplification (Christenbury et al., 2010) . This was modified for amplification of viral sequences from serum to include a step where DNase I was used to first degrade host DNA (Allander et al., 2001) .
    keywords: dna; genomic; library; rna; sequences; sequencing; viruses
       cache: cord-326225-crtpzad7.txt
  plain text: cord-326225-crtpzad7.txt

        item: #99 of 119
          id: cord-328259-3g4klpyg
      author: Guajardo-Leiva, Sergio
       title: Metagenomic Insights into the Sewage RNA Virosphere of a Large City
        date: 2020-09-21
       words: 7642
      flesch: 43
     summary: Viral sequences can also be misannotated to homologous cellular genes [36, 39] , which relies on the low number and diversity of viral sequences in the databases. Viral sequences identified as Partitiviridae-like viruses included in the unclassified RNA viruses ShiM-2016 category in the NCBI taxonomy (~25% abundance; Figure 2B ) and Totiviriade family were also highly abundant in treated and untreated sewage samples from the EU
    keywords: abundance; database; family; figure; human; ncbi; proteins; rdrp; rna; rotavirus; samples; sequences; sewage; trebal; viral; viruses; wastewater
       cache: cord-328259-3g4klpyg.txt
  plain text: cord-328259-3g4klpyg.txt

        item: #100 of 119
          id: cord-328644-odtue60a
      author: Comandatore, Francesco
       title: Insurgence and worldwide diffusion of genomic variants in SARS-CoV-2 genomes
        date: 2020-05-28
       words: 6537
      flesch: 37
     summary: If a functional role for this mutation will be demonstrated, this pattern seems to indicate that different variants might have different fitness when interacting with different host's haplotypes, i.e. in case Asian and European have different haplotypes concerning some of the proteins interacting with the Spike, like for instance Furin. When focusing on single Clades across all macro-regions previously defined, we find a heterogeneous situation with different variants increasing in time in different countries.
    keywords: coronavirus; et al; frequency; position; present; protein; sars; sequences; spike; time; variants; virus
       cache: cord-328644-odtue60a.txt
  plain text: cord-328644-odtue60a.txt

        item: #101 of 119
          id: cord-330067-ujhgb3b0
      author: Huang, Yi
       title: CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes
        date: 2007-10-02
       words: 3010
      flesch: 50
     summary: During the process of coronavirus gene sequences analysis, we encountered a major problem when coronavirus gene sequences, especially those of orf1ab, were used for blast search against GenBank or any other coronavirus databases. The main goal for setting up CoVDB is to provide a convenient and efficient platform for retrieving batches of coronavirus gene sequences.
    keywords: coronavirus; covdb; genes; genome; group; proteins; sequence
       cache: cord-330067-ujhgb3b0.txt
  plain text: cord-330067-ujhgb3b0.txt

        item: #102 of 119
          id: cord-330312-1pjolkql
      author: Liu, Y.-T.
       title: Infectious Disease Genomics
        date: 2017-01-20
       words: 5181
      flesch: 36
     summary: 16, 17 The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002. In order to understand potential functions of human genes through comparative sequence analyses, they also advised that the HGP must not be restricted to the human genome and should include model organisms including mouse, bacteria, yeast, fruit fly, and worm.
    keywords: acid; artemisinin; disease; genome; hgp; human; influenza; malaria; parasites; project; sequence; sequencing; vaccine; vector; virus
       cache: cord-330312-1pjolkql.txt
  plain text: cord-330312-1pjolkql.txt

        item: #103 of 119
          id: cord-331698-rwow1ydx
      author: Latorre-Pérez, Adriel
       title: A lab in the field: applications of real-time, in situ metagenomic sequencing
        date: 2020-08-20
       words: 6734
      flesch: 31
     summary: ONT metagenomic sequencing results were similar to those obtained with Illumina 16S rRNA sequencing, but a reduced time was achieved using MinION. The nextgeneration sequencing revolution and its impact on genomics Actionable diagnosis of neuroleptospirosis by next-generation sequencing Analysis of culture-dependent versus culture-independent techniques for identification of bacteria in clinically obtained bronchoalveolar lavage fluid Nanopore sequencing as a rapidly deployable Ebola outbreak tool Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella Rapid identification of pathogens from positive blood culture bottles with the MinION nanopore sequencer Rapid nanopore sequencing of plasmids and resistance gene detection in clinical isolates Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis Metagenomic arbovirus detection using MinION nanopore
    keywords: 16s; analysis; applications; dna; identification; microbial; minion; nanopore; read; samples; sequences; sequencing; situ; technologies; time
       cache: cord-331698-rwow1ydx.txt
  plain text: cord-331698-rwow1ydx.txt

        item: #104 of 119
          id: cord-334127-wjf8t8vp
      author: Brister, J. Rodney
       title: NCBI Viral Genomes Resource
        date: 2015-01-28
       words: 3865
      flesch: 25
     summary: Given the difficulty of implementing a purely well annotated representation of viral genome sequences, the viral RefSeq model has evolved into a more flexible approach that includes both reference and representative sequences. The growing cloud of viral genome sequences also poses significant barriers to the maintenance of reference genome records.
    keywords: data; genome; records; reference; refseq; resource; sequence; species; taxonomy; virus; viruses
       cache: cord-334127-wjf8t8vp.txt
  plain text: cord-334127-wjf8t8vp.txt

        item: #105 of 119
          id: cord-334394-qgyzk7th
      author: Edgar, Robert C.
       title: Petabase-scale sequence alignment catalyses viral discovery
        date: 2020-08-10
       words: 8139
      flesch: 49
     summary: Innovative fields such as high-throughput functional viromics [39] leverage these broad and rapidly growing collections of viral sequences, and can inform evidence-based policies responding to emerging pandemics [40, 41] . Accurate annotation of CoV genomes is challenging due to ribosomal frameshifts and polyproteins which are cleaved into maturation proteins [56] , and thus previously-annotated viral genomes offer a guide to accurate gene-calls and protein functional predictions.
    keywords: alignment; annotation; assembly; contigs; cov; coverage; data; family; figure; genome; identity; rdrp; reads; reference; rna; sequence; sequencing; serratus; sra; study; tree; virus
       cache: cord-334394-qgyzk7th.txt
  plain text: cord-334394-qgyzk7th.txt

        item: #106 of 119
          id: cord-338207-60vrlrim
      author: Lefkowitz, E.J.
       title: Virus Databases
        date: 2008-07-30
       words: 7958
      flesch: 45
     summary: Extensible markup language (XML) is another widely used format for storing database information. The original data may be faulty: using sequence data as one example, nucleotides in a DNA sequence may have been misread or miscalled, or someone may even have mistyped the sequence.
    keywords: biological; data; database; genbank; gene; information; ncbi; protein; record; sequence; table; viral; virus; viruses
       cache: cord-338207-60vrlrim.txt
  plain text: cord-338207-60vrlrim.txt

        item: #107 of 119
          id: cord-339209-oe8onyr9
      author: Vasilakis, Nikos
       title: Mesoniviruses are mosquito-specific viruses with extensive geographic distribution and host range
        date: 2014-05-20
       words: 5821
      flesch: 41
     summary: Mesoniviridae: a proposed new family in the order Nidovirales formed by a single species of mosquito-borne viruses Examining landscape factors influencing relative distribution of mosquito genera and frequency of virus infection Discovery of the first insect nidovirus, a missing evolutionary link in the emergence of the largest RNA virus genomes An insect nidovirus emerging from a primary tropical rainforest Identification and characterization of genetically divergent members of the newly established family mesoniviridae Molecular biology and pathogenesis of roniviruses A new nidovirus (NamDinh virus NDiV): its ultrastructural characterization in the C6/36 mosquito cell line A new species of mesonivirus from the northern territory, australia Supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy Rtips: fast and accurate tools for RNA 2D structure prediction using integer programming A Wolbachia symbiont in Aedes aegypti limits infection with dengue, Chikungunya, and Plasmodium The relative importance of innate immune priming in Wolbachia-mediated dengue interference The native Wolbachia endosymbionts of Drosophila melanogaster and Culex quinquefasciatus increase host resistance to West Nile virus infection Negevirus: a proposed new taxon of insect-specific viruses with wide geographic distribution The footprint of genome architecture in the largest genome expansion in RNA viruses Isolation of a Singh's Aedes albopictus cell clone sensitive to dengue and Chikungunya viruses SMART 7: recent updates to the protein domain annotation resource SMART, a simple modular architecture research tool: identification of signaling domains MUSCLE: multiple sequence alignment with high accuracy and high throughput New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing SSE: a nucleotide and amino acid sequence analysis platform Mesoniviruses are mosquito-specific viruses with extensive geographic distribution and host range Additional file 5: Figure S5 . The compiled sequences had their relationship to other viruses determined by a BLASTX search.
    keywords: alignment; analysis; conserved; domains; figure; genome; isolates; mesoniviruses; ndiv; orf1a; region; sequence; species; structure
       cache: cord-339209-oe8onyr9.txt
  plain text: cord-339209-oe8onyr9.txt

        item: #108 of 119
          id: cord-339915-8j04y50s
      author: Deng, Wei
       title: DV-Curve Representation of Protein Sequences and Its Application
        date: 2014-05-08
       words: 2960
      flesch: 45
     summary: A novel 2-D graphical representation of DNA sequences of low degeneracy On the uniqueness of quantitative DNA difference descriptions in 2D graphical representation models Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation A class of new 2-D graphical represent ation of DNA sequences and their application Graphical representations of DNA as 2-D map H-L curve: a novel 2D graphical representation for DNA sequences DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences Analysis of similarity/dissimilarity of DNA sequences based on chaos game representation A 3D graphical representation of DNA sequences and its application A group of 3D graphical representation of DNA sequences based on dual nucleotides New graphical representation of a DNA sequence based on the ordered dinucleotides and its application to sequence analysis Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation Novel 4D numerical representation of DNA sequences On the similarity of DNA primary sequences based on 5-D representation Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases Unique graphical representation of protein sequences based on nucleotide triplet codons A 2-D graphical representation of protein sequences based on nucleotide triplet codons Protein-based phylogenetic analysis by using hydropathy profile of amino acids 2-D Graphical representation of proteins based on physico-chemical properties of amino acids 2-D graphical representation of protein sequences and its application to coronavirus phylogeny New 3-D graphical representation of protein sequences and its application A 2D graphical representation of protein sequence and its numerical characterization Similarity/dissimilarity studies of protein sequences based on a new 2d graphical representation New technique: protein sequence analysis based on hydropathy profile of amino acids 3D graphical representation of protein sequences and their statistical characterization Similarity/dissimilarity analysis of protein sequences using the spatial median as a descriptor Modeling study on the validity of a possibly simplified representation of proteins On 3-D graphical representation of DNA primary sequences and their numerical characterization Novel 2-D graphical representation of DNA sequences and their numerical characterization Compact 2-D graphical representation of DNA Application of 2-D graphical representation of DNA sequence On the complexity of multiple sequence alignment A probabilistic measure for alignment-free sequence comparison An information-based sequence distance and its application to whole mitochondrial genome phylogeny A new sequence distance measure for phylogenetic tree construction A weighted least-squares approach for inferring phylogenies from incomplete distance matrices A novel coronavirus associated with severe acute respiratory syndrome The genome sequence of the sars-associated coronavirus The Principles and Practice of Numerical Classification Characterization of a novel coronavirus associated with severe acute respiratory syndrome Severe acute respiratorysyndrome coronavirus-like virus in Chinese horseshoe bats The authors thank to all the anonymous reviewers for their valuable suggestions and support. Comput Math Methods Med DOI: 10.1155/2014/203871 sha: doc_id: 339915 cord_uid: 8j04y50s Based on the detailed hydrophobic-hydrophilic(HP) model of amino acids, we propose dual-vector curve (DV-curve) representation of protein sequences, which uses two vectors to represent one alphabet of protein sequences.
    keywords: curve; dna; protein; representation; sequences
       cache: cord-339915-8j04y50s.txt
  plain text: cord-339915-8j04y50s.txt

        item: #109 of 119
          id: cord-340907-j9i1wlak
      author: Zarai, Yoram
       title: Evolutionary selection against short nucleotide sequences in viruses and their related hosts
        date: 2020-04-27
       words: 8168
      flesch: 41
     summary: The virus and host coding sequences and association information was retrieved from a published database. We provide various novel discoveries that may shed light on the evolution of viral DNA sequences and on the virus co-evolution with its respective hosts.
    keywords: analysis; codon; genes; genome; host; nucleotide; number; restriction; selection; sequences; size; viruses; zikv
       cache: cord-340907-j9i1wlak.txt
  plain text: cord-340907-j9i1wlak.txt

        item: #110 of 119
          id: cord-341564-fvuwick5
      author: Qi, Zhao-Hui
       title: Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application
        date: 2018-06-12
       words: 2660
      flesch: 50
     summary: Novel spectral representation of RNA secondary structure without loss of information Milestones in graphical bioinformatics Four-component spectral representation of DNA sequences Graphical and numerical representations of DNA sequences: statistical aspects of similarity 2D-dynamic representation of DNA sequences Spectral-dynamic representation of DNA sequences 3D-dynamic representation of DNA sequences A group of 3D graphical representation of DNA sequences based on dual nucleotides WITHDRAWN: 2-D graphical representation of proteins based on physico-chemical properties of amino acids ADLD: a novel graphical representation of protein sequences and its application Protein map: an alignment-free sequence comparison method based on various properties of amino acids An efficient numerical method for protein sequences similarity analysis based on a new two-dimensional graphical representation Graphical representation of proteins as four-color maps and their numerical characterization A protein mapping method based on physicochemical properties and dimension reduction The graphical representation of protein sequences based on the physicochemical properties and its applications F-Curve, a graphical representation of protein sequences for similarity analysis based on physicochemical properties of amino acids Analysis of similarity/dissimilarity of protein sequences The genetic code and error transmission In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix.
    keywords: amino; method; protein; representation; sequences; similarity
       cache: cord-341564-fvuwick5.txt
  plain text: cord-341564-fvuwick5.txt

        item: #111 of 119
          id: cord-341879-vubszdp2
      author: Li, Lucy M
       title: Genomic analysis of emerging pathogens: methods, application and future trends
        date: 2014-11-22
       words: 5030
      flesch: 31
     summary: Because of the simplistic assumptions of population genetics models, the population size inferred using coalescentbased methods cannot be directly interpreted as pathogen population size (prevalence of infection). Although the two approaches are methodologically different, both aim to reconstruct pathogen population history and produce estimates of epidemiological parameters, such as the reproductive number (R 0 ).
    keywords: analysis; coalescent; data; disease; models; pathogen; population; sequences; time; transmission
       cache: cord-341879-vubszdp2.txt
  plain text: cord-341879-vubszdp2.txt

        item: #112 of 119
          id: cord-342785-55r01n0x
      author: Lemmon, Gordon H
       title: Predicting the sensitivity and specificity of published real-time PCR assays
        date: 2008-09-25
       words: 4319
      flesch: 46
     summary: GL found real time PCR signatures in the literature, wrote Perl scripts, and performed the analysis of published signatures. It has been estimated that a minimum of 3-4 genomes are needed in order to computationally design TaqMan PCR signatures likely to detect most strains, with those isolates chosen for sequencing that have been selected to span gradients of geographic, phenotypic, and temporal variation [19] .
    keywords: assay; detection; pcr; primer; probe; sensitivity; sequences; signatures; time; virus
       cache: cord-342785-55r01n0x.txt
  plain text: cord-342785-55r01n0x.txt

        item: #113 of 119
          id: cord-343863-q1y8uscj
      author: Whitney, Joe
       title: Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches
        date: 2005-02-08
       words: 3464
      flesch: 58
     summary: It allows the researcher to ask the question: what new sequences match my sequences since the last time I searched? ReHAB is designed to handle large numbers of query sequences, such as whole genomes or sets of genomes.
    keywords: blast; database; hits; query; rehab; results; sequences
       cache: cord-343863-q1y8uscj.txt
  plain text: cord-343863-q1y8uscj.txt

        item: #114 of 119
          id: cord-344782-ond1ziu5
      author: Zhang, Jing
       title: Identification of a novel nidovirus as a potential cause of large scale mortalities in the endangered Bellinger River snapping turtle (Myuchelys georgesi)
        date: 2018-10-24
       words: 6005
      flesch: 45
     summary: Similarity to other viruses for each of the ORFs and their predicted amino acid sequences were determined by searches using BLASTn and BLASTp [13] algorithms through the NCBI server (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Ball Python Nidovirus: a Candidate Etiologic Agent for Severe Respiratory Disease in Python regius Identification of a novel nidovirus in an outbreak of fatal respiratory disease in ball pythons (Python regius) Novel divergent nidovirus in a python with pneumonia Nidovirus-Associated Proliferative Pneumonia in the Green Tree Python (Morelia viridis) Discovery and partial genomic characterisation of a novel nidovirus associated with respiratory disease in wild shingleback lizards (Tiliqua rugosa) Redefining the invertebrate RNA virosphere The evolutionary history of vertebrate RNA viruses Programmed translational frameshifting Ribosomal frameshifting on viral RNAs The primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus MHV-A59 a highly conserved polymerase is expressed by an efficient ribosomal frameshifting mechanism An RNA Pseudoknot in the 3' end of the Arterivirus genome has a critical role in regulating viral RNA synthesis Changes to taxonomy and the international code of virus classification and Nomenclature ratified by the international committee on taxonomy of viruses Sequence-based identification of microbial pathogens: a reconsideration of Koch's postulates Molecular comparison of isolates of an emerging fish pathogen, Koi herpesvirus, and the effect of water temperature on mortality of experimentally infected Koi Is horizontal transmission of the ostreid herpesvirus OsHV-1 in Crassostrea gigas affected by unselected or selected survival status in adults to juveniles?
    keywords: acid; animals; disease; georgesi; min; nidovirus; pcr; python; river; rna; samples; sequence; species; tissues; turtle; virus
       cache: cord-344782-ond1ziu5.txt
  plain text: cord-344782-ond1ziu5.txt

        item: #115 of 119
          id: cord-345552-h6fwi0qn
      author: Li, Q.-G.
       title: Hydropathic characteristics of adenovirus hexons
        date: 1997-07-01
       words: 3524
      flesch: 48
     summary: Every hexon DNA sequence was translated to protein sequence by using program EditSeq-Translation. Here, we report the hydropathy analysis of 14 adenovirus hexon sequences predicted from a newly determined Ad7 hexon DNA sequence and thirteen published hexon sequences of Ad2, Ad3, Ad4, Ad5, Ad12, Ad16, Ad40, Ad41, Ad48, Bav3, Mav1, Fav1 and Fav10.
    keywords: acid; adenovirus; amino; dna; hexon; regions; sequence; type
       cache: cord-345552-h6fwi0qn.txt
  plain text: cord-345552-h6fwi0qn.txt

        item: #116 of 119
          id: cord-348427-worgd0xu
      author: Hatcher, Eneida L.
       title: Virus Variation Resource – improved response to emergent viral outbreaks
        date: 2017-01-04
       words: 5555
      flesch: 43
     summary: When searching protein sequences, selecting 'Full-length sequences only' filter, limits retrieved sequences to those with a complete coding region as determined to the relevant reference. Here, protein reference sequences are aligned with potential translations of the query sequence.
    keywords: annotation; data; metadata; nucleotide; protein; records; resource; search; sequences; terms; variation; virus; viruses
       cache: cord-348427-worgd0xu.txt
  plain text: cord-348427-worgd0xu.txt

        item: #117 of 119
          id: cord-353290-1wi1dhv6
      author: Kustin, Talia
       title: Biased mutation and selection in RNA viruses
        date: 2020-09-28
       words: 7615
      flesch: 42
     summary: One major challenge in tackling RNA viruses is the fact they are extremely genetically diverse. RNA viruses are an extremely diverse collection of entities, spanning a diverse range of hosts, morphologies, genome organizations, and genetic composition.
    keywords: bias; branches; codon; fig; genomes; host; mutation; nucleotide; rna; selection; sequences; usage; viruses
       cache: cord-353290-1wi1dhv6.txt
  plain text: cord-353290-1wi1dhv6.txt

        item: #118 of 119
          id: cord-354465-5nqrrnqr
      author: Haslinger, Christian
       title: RNA structures with pseudo-knots: Graph-theoretical, combinatorial, and statistical properties
        date: 1999
       words: 10375
      flesch: 61
     summary: A new principle of RNA folding based on pseudoknotting Random induced subgraphs of generalized n-cubes Bio-molecular shapes and algebraic structures Generic properties of combinatory maps: Neural networks of RNA secondary structures Petersen family minors Sachs' linkless embedding conjecture Linear trees and RNA secondary structure How to search for RNA structures. Combinatorial aspects of RNA secondary structures have been studied in detail by Waterman and co-workers (Stein and Waterman, 1978; Waterman, 1978; Waterman and Smith, 1978a, b; Penner and Waterman, 1993;
    keywords: base; diagram; energy; graph; knots; neutral; number; pseudo; rna; sequences; structures; vertices
       cache: cord-354465-5nqrrnqr.txt
  plain text: cord-354465-5nqrrnqr.txt

        item: #119 of 119
          id: cord-355075-ieb35upi
      author: Papenfuss, Anthony T
       title: The immune gene repertoire of an important viral reservoir, the Australian black flying fox
        date: 2012-06-20
       words: 8959
      flesch: 48
     summary: The GO classification demonstrates that a diverse range of genes were identified in each of our two datasets providing a broad survey of bat genes. We have also begun to identify some of the genes involved in immune responses in this species and carry out functional studies in bat cells
    keywords: alecto; antiviral; bat; bats; cells; class; contigs; datasets; genes; immune; mammals; mhc; protein; receptors; sequences; species; thymus; transcriptome; transcripts; viruses
       cache: cord-355075-ieb35upi.txt
  plain text: cord-355075-ieb35upi.txt